**Michael Cochez Madalina Croitoru Pierre Marquis Sebastian Rudolph (Eds.)**

# **Graph Structures for Knowledge Representation and Reasoning**

**6th International Workshop, GKR 2020 Virtual Event, September 5, 2020 Revised Selected Papers**

## Lecture Notes in Artificial Intelligence 12640

## Subseries of Lecture Notes in Computer Science

Series Editors

Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany

Founding Editor

Jörg Siekmann DFKI and Saarland University, Saarbrücken, Germany More information about this subseries at http://www.springer.com/series/1244

Michael Cochez • Madalina Croitoru • Pierre Marquis • Sebastian Rudolph (Eds.)

# Graph Structures for Knowledge Representation and Reasoning

6th International Workshop, GKR 2020 Virtual Event, September 5, 2020 Revised Selected Papers

Editors Michael Cochez Computer Science Department Vrije Universiteit Amsterdam Amsterdam, The Netherlands

Pierre Marquis Institut Universitaire de France CRIL, Univ. Artois & CNRS Lens, France

Madalina Croitoru LIRMM Montpellier, France

Sebastian Rudolph Fakultät Informatik TU Dresden Dresden, Germany

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-030-72307-1 ISBN 978-3-030-72308-8 (eBook) https://doi.org/10.1007/978-3-030-72308-8

LNCS Sublibrary: SL7 – Artificial Intelligence

© The Editor(s) (if applicable) and The Author(s) 2021. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## Preface

The development of effective techniques for knowledge representation and reasoning (KRR) is a crucial aspect of successful intelligent systems. Different representation paradigms, as well as their use in dedicated reasoning systems, have been extensively studied in the past. Nevertheless, new challenges, problems, and issues have emerged in the context of knowledge representation in Artificial Intelligence (AI), involving the logical manipulation of increasingly large information sets (see for example Semantic Web, BioInformatics, and so on). Improvements in storage capacity and performance of computing infrastructure have also affected the nature of KRR systems, shifting their focus towards representational power and execution performance. Therefore, KRR research is faced with the challenge of developing knowledge representation structures optimized for large-scale reasoning. This new generation of KRR systems includes graph-based knowledge representation formalisms such as Constraint Networks (CNs), Bayesian Networks (BNs), Semantic Networks (SNs), Conceptual Graphs (CGs), Formal Concept Analysis (FCA), CP-nets, GAI-nets, and Argumentation Frameworks, all of which have been successfully used in a number of applications. The goal of the workshop series on Graph Structures for Knowledge Representation and Reasoning (GKR) is to bring together researchers involved in the development and application of graph-based knowledge representation formalisms and reasoning techniques.

This volume contains extended and revised selected papers of the sixth edition of GKR, under the auspices of ScaDS.AI – Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig, which took place jointly with ECAI 2020, the 24th European Conference on Artificial Intelligence, which was supposed to be held in Santiago de Compostela, Spain. Like ECAI, GKR had to be re-shaped into a virtual edition, given the global pandemic. This was a first, compared to previous editions of GKR held in Pasadena, USA (2009), Barcelona, Spain (2011), Beijing, China (2013), Buenos Aires, Argentina (2015), and Melbourne, Australia (2017). Still, like before, thanks to the association with a major international AI conference, the workshop provided the perfect venue for a rich and valuable exchange. As usual, the workshop submissions underwent single-blind reviewing by the program committee, each receiving between two and three reviews. On top of the extended workshop papers, this volume also contains two invited additional contributions from core GKR community members.

The scientific program of this workshop included many topics related to graph-based knowledge representation and reasoning, from sub-disciplines as diverse as conceptual graphs, formal concept analysis, graphical models, graph neural networks, concept diagrams, and others. Application areas included Smart Homes, Education, Team Formation, Enterprise Architectures, and Usage Pattern Analysis, demonstrating the wide applicability of graph-based KR methods. All in all, the sixth edition of the GKR workshop was very successful despite the unusual circumstances. The papers coming from diverse fields all addressed various issues for knowledge representation and reasoning and the common graph-theoretic background helped to bridge the gap between the different communities. This made it possible for the participants to gain new insights and inspiration.

The organizers are very grateful for the support of ECAI and we would also like to thank the program committee of the workshop for their hard work in reviewing papers and providing valuable guidance to the contributors. But, of course, GKR 2020 would not have been possible without the dedicated involvement of the contributing authors and participants.

February 2021 Michael Cochez Madalina Croitoru Pierre Marquis Sebastian Rudolph

## Organization

## Workshop Chairs


### Program Committee

Pierre Bisquert INRA & IATE, France Mary Keeler VivoMind, Inc., USA Bernard Moulin Université Laval, Canada

Galia Angelova Bulgarian Academy of Sciences, Bulgaria Manuel Atencia Université Grenoble Alpes & Inria, France Zied Bouraoui CRIL, Université d'Artois & CNRS, France Dan Corbett OptimodalTechnologies, USA Olivier Corby INRIA, Université Côte d'Azur, France Dragan Doder Utrecht University, The Netherlands Nathalie Hernandez IRIT, Université de Toulouse, France Robert Jäschke Humboldt-Universität zu Berlin, Germany Uta Priss Ostfalia University, Germany Ricardo Oscar Rodriguez Universidad de Buenos Aires, Argentina Karim Tabia Université d'Artois & CNRS, France Wamberto Vasconcelos University of Aberdeen, UK Srdjan Vesic CRIL, CNRS & Université d'Artois, France Nic Wilson Insight & University College Cork, Ireland Bruno Yun University of Aberdeen, UK

## Organizing Body

Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig www.scads.de @scads

## Contents

#### Extended Workshop Papers


#### Invited Additional Contributions


## **Extended Workshop Papers**

## **Active Semantic Relations in Layered Enterprise Architecture Development**

Matt Baxter<sup>1</sup> , Simon Polovina1(B) , Wim Laurier<sup>2</sup> , and Mark von Rosing<sup>3</sup>

<sup>1</sup> Conceptual Structures Research Group, Sheffield Hallam University, Sheffield, UK a7033771@my.shu.ac.uk, S.Polovina@shu.ac.uk <sup>2</sup> Universit´e Saint-Louis, Brussels, Belgium wim.laurier@usaintlouis.be <sup>3</sup> LEADing Practice, Dronningmølle, Denmark mvr@leadingpractice.com

**Abstract.** Enterprise Architecture (EA) metamodels align an organisation's business, information and technology resources so that these assets best meet the organisation's purpose. The Layered EA Development (LEAD) Ontology enhances EA practices by a metamodel with layered metaobjects as its building blocks interconnected by semantic relations. Each metaobject connects to another metaobject by two semantic relations in opposing directions, thus highlighting how each metaobject views other metaobjects from its perspective. While the resulting two directed graphs reveal all the multiple pathways in the metamodel, more desirable would be to have one directed graph that focusses on the dependencies in the pathways. Towards this aim, using CG-FCA (where CG refers to Conceptual Graph and FCA to Formal Concept Analysis) and a LEAD case study, we determine an algorithm that elicits the active as opposed to the passive semantic relations between the metaobjects resulting in one directed graph metamodel. We also identified the general applicability of our algorithm to any metamodel that consists of triples of objects with active and passive relations.

**Keywords:** Enterprise architecture frameworks *·* Layered enterprise architecture development *·* Business-IT alignment *·* Ontology *·* Semantics and reasoning *·* Conceptual structures *·* Model verification and validation

## **1 Introduction**

Enterprise Architecture (EA) is a comprehensive approach to the documentation and understanding of organisational composition to promote alignment of its business, information and technology assets [9]. The Layered Enterprise Architecture Development (LEAD) Ontology includes a metamodel that is underpinned by building blocks consisting of 91 metaobjects organised in layers and sub-layers [7,14]. Semantic relations link the metaobjects thereby integrating all aspects of business, information, and technology for any organisation. These multiple relations highlight the inbuilt interconnections and the interdependencies between the elements in an enterprise. Conceptual Graphs (CG) are a formalised method of knowledge representation based on concepts and their relations [11,12]. Formal Concept Analysis (FCA) is a principled approach to determining a conceptual hierarchy of objects and their attributes [15]. FCA interrelates objects through their related attributes, thus enabling FCA to determine and visualise a conceptual hierarchy [3]. A CG can visually display LEAD's metaobjects and their semantic relations by linking each concept to another via these relations; however, validation can be difficult due to the manual nature of the task [1]. Subsequently, processing these 'triples' (metaobject–relation– metaobject) via FCA can highlight gaps in the model, revealing an organisational gap or human error in the modelling process. Thus, while a manual review of the LEAD artefacts can identify organisational gaps, an element of mathematical rigour can be applied to the process thereby complementing LEAD through the application of CG and FCA [6,8].

#### **2 The Metamodel Diagram**

To illustrate the contribution of CG and FCA, Fig. 1 acts as our starting point. This figure represents the metamodel of a warehouse pick pack process of a UK manufacturer, based on the LEAD Enterprise Ontology referred to earlier (i.e. LEAD ID#-ES20001ALL) [13]. The metamodel was created using the Enterprise Plus (E+) software (www.enterpriseplus.tools) from LEADing Practice, a notfor-profit body of LEAD industry practitioners (www.leadingpractice.com). E+ is a comprehensive repository of LEAD reference content, including its artefacts, metaobjects, and semantic relations. The semantic relations in Fig. 1 go in two directions between each metaobject. This duality is intended in many EA metamodels, including LEAD. That is because it reveals how each metaobject views itself in relation to each other directly, and indirectly through intermediate metaobjects; hence LEAD metamodels are two-way directed graphs [9].

### **3 Activating the Metamodel**

The *CGtoF CA* algorithm converts the inherent ternary relations of CGs to the binary relations required for FCA [1]. This algorithm can also apply to other directed graph triples, including LEAD metamodels as illustrated by Fig. 1 [9]. The formal concepts can then appear in a Formal Concept Lattice (FCL). The CG-FCA software based on *CGtoF CA* thus facilitates an improved understanding of LEAD metamodels in tandem with highlighting human errors in the manual modelling process [1,9]. Further to that previous work, and in search of the metaobjects' dependence on each other, the proposed algorithm shown in Fig. 2 distinguishes the active and passive semantic relations. An active relation depicts a situation whereby a metaobject directs another, with the latter metaobject dependent on it, i.e. the passive relation. Following the identification

**Fig. 1.** Warehouse pick pack metamodel (from LEAD ID#-ES20001ALL)

of all the active relations, the algorithm incrementally rebuilds the model and removes unwanted semantic cycles before being visualised in an FCL.

## **3.1 Methodology**

Using the algorithm depicted by Fig. 2, we identify and analyse the active semantic relations towards our goal of attaining an active direction graph, thus highlighting the metaobject dependencies. Strictly-speaking, our algorithm is presently more of a 'pseudo-algorithm' as it requires human interpretation. For example, in line 19 *isT ransitive*(*v*) we could debate this step, with one possibility that we should just invert the relation. Formalising the algorithm so that it can be computer-executed is the subject of our ongoing work. Meanwhile, Fig. 2 fits the present purpose of our claims.

**Fig. 2.** Active semantic relations algorithm

Following Fig. 2, we reviewed each two-way semantic relation to determine which should be assigned active or passive status and created an initial active model. We examined the semantics in the narrative of the relations and identified which metaobject was directing the other and vice versa. We then rebuilt the model by reviewing each concept in turn to remove semantic cycles [9]. Where both a direct and indirect pathway exists between two metaobjects, we removed the former, as the latter illustrates the mediating metaobjects. This step enabled a deeper understanding of the interdependencies. The ternary relations were compiled as 3-column CSV files and processed by the CG-FCA application to create the binary concepts. The operations and outcomes for each metaobject CSV file were recorded in a table to document the steps taken. After successfully refactoring each concept, we generated the FCL.

#### **3.2 Findings**

Following the selection of the active semantic relations in the one hundred fortyseven pairs of relations, the 00ActiveAll.csv file was unable to be processed by the CG-FCA application despite multiple attempts. The final attempt was aborted with the '00ActiveAll report' file having amassed a size of over 10 GB after nearly eighty-eight hours of processing time. This first experiment prevented the creation of an FCL for the initial active model.


**Table 1.** Refactoring the Capability sublayer of the metamodel – Active Organisation, Role, and Organisational Function.

Identifying the source of this seemingly infinite processing run was therefore attempted by employing an iterative approach and gradually increasing the number of triples included in 00ActiveAll.csv; however, we then encountered further issues. For example, in the case of 00ActiveAllDataObject1.csv (comprised of all 00ActiveAll triples up to and including the first instance of a Data Object triple), the processing time totalled just over twelve hours. Hence, there exists an issue of practicality in attempting to identify the triple that is causing the seemingly infinite compilation. We thus judged when to abort the processing due to uncertainty surrounding whether the processing run will not complete or whether it is only taking longer than expected compared to the previous iteration. The difficulty of the decision became exacerbated as processing time appears dependent on both the triple inserted and existing triples in the file, in the sense that one triple could cause a minimal increase in processing time while the impact of another could be significant. This intractability could reflect a combinatorial explosion: the number of input values increases exponentially with the number of potential outputs [2]. Nonetheless, and in light of the above experiences, we were able to proceed.


**Table 2.** Refactoring the data sublayer of the metamodel – Active Data Object.

The first five metaobject CSV files contained no cycles, three of which are detailed in Table 1. Subsequently, five cycles appeared in 06ActiveLocation.csv. The decision to replace 'Product - at - Location' with 'Location - at - Product' resolved all cycles<sup>1</sup>.

We also encountered cycles in the LEAD Data sublayer, with cycles ranging from one to two hundred and seventy-nine. Table 2 shows the three iterations

<sup>1</sup> Not all the metaobjects and semantic relations appear in Fig. 1, including these two-way metaobjects and semantic relations, to maintain the figure's readability.

required to resolve all cycles initially presented in 16ActiveDataObject.csv. Due to space considerations, we do not list these cycles. We identified 'Platform Component – serves – Location' as a common triple across cycles; however, an alternative pathway remained undiscovered. 'Location –has – Process – produces/consumes – Data Object' exists as a more indirect pathway. However, we deleted it as part of an operation for 08v2ActiveProcess.csv, which highlights the cumulative effect of the decisions made at each stage of refactoring. Consequently, we made alternative choices. Considering the vast number of initial cycles presented (two hundred and thirty-five) and the manual nature of the activity, it is possible that a more indirect pathway does exist but overlooked by a human modeller.

#### **3.3 Formal Concept Lattice**

To visualise the output of CG-FCA, we created the FCL for 25ActiveInfrastructureService.csv, displayed in Fig. 3. The FCL lucidly exhibits the dependencies and driving metaobjects. A salient example is Product illustrated as being dependent on Process, which in turn is dependent on Role. In the context of the warehouse pick pack process, this dependency suggests that the product that is picked and packed is dependent on the process for doing so, which in turn is dependent on the employee that executes the process. Perhaps the most initially striking element of the FCL is the presence of Platform Component within the top-most formal concept, implying all objects below it in the diagram, i.e. its extent, are in some way dependent on it. While we might expect that technology ought to be driven by business, technology can drive business. For example, in recent years, the rise of cloud computing (a Platform Component) has driven a proliferation of decentralised business models. Accordingly, remote working is the norm and the presence of physical business components (Business Object, Location) is either minimised or eschewed entirely dependent on the industry.

A further interesting element elucidated in the FCL is 'Platform Device – hosts – Application/System', which implies that an Application/System is dependent on a Platform Device. This active pathway suggests that Platform Devices are the starting points, with the Application/System developed based on the specifications, constraints, and existence of the Platform Devices. While this makes sense, so does the opposing view, whereby Platform Device should be dependent upon Application/System because without an application to run, for what purpose does the device exist?

The presence of an empty formal concept close to the top of the lattice is also notable, and several potential explanations exist. Firstly, it could merely be a mistake in the modelling process, a probability which is heightened by the vast number of cycles encountered at some stages of the refactoring. Secondly, it could also be that the empty formal concept is irrelevant, as it exists purely as a vehicle for the facilitation of human understanding. Thirdly, and most speculatively, it could be pointing to a hitherto unnamed formal concept object, which in turn could potentially indicate a new metaobject arising from the other metaobjects and semantic relations, already validated by the LEADing Practice community.

**Fig. 3.** 25ActiveInfrastructureService lattice

**Fig. 4.** 25v2ActiveInfrastructureService lattice

To remedy Platform Component's presence in the top-most formal concept, we reviewed the FCL and identified the source as 'Platform Component – serves – Location'. For convenience, the triple was substituted with the passive triple, as were the two further triples containing the 'serves' semantic relation. Figure 4 displays the resultant FCL.

The revised FCL arguably presents a more intuitive model in the context of the warehouse pick pack process, with Location preceding Platform Component and much of the lattice being dependent upon the former. As pick pack represents the physical process of picking and packing goods at a location – a concept that pre-dates technology platforms, the revised interpretation offers a more lucid model. However, we note that due to the manual and interpretative nature of the exercise, other modellers could feasibly reach different conclusions.

### **4 Discussion**

#### **4.1 Implications**

We have demonstrated that an active direction graph can be attained via the identification of active semantic relations, rebuilding of concepts, and visualisation via an FCL. The proposed algorithm depicted by Fig. 2 enabled us to elaborate on the identification and rebuilding stages, supported by the *CGtoF CA* algorithm implemented in the CG-FCA application. The ensuing FCL presented a clear view of metaobject dependency and driving forces, consequently providing a deeper understanding of the LEAD framework both generally and in the context of a warehouse pick pack process.

Furthermore, the presence of an unnamed concept in the 25ActiveInfrastructureService lattice could prompt a further, deeper examination of the semantics, potentially leading to refined semantic relations or a new metaobject. These enhancements would underpin the rigour of LEAD, by revealing which metaobjects are consistently driving others due to their active and passive semantic relations. It is in this scenario where the active-directed graphs visualised as FCLs provide value, due to their explicit ordering of driving forces and dependencies. It is conceivable that such diagrams could, due to their facilitation of more indepth understanding, provide business users with direction when attempting to resolve issues or enact continuous improvement. For example, for an organisation wishing to improve the KPIs of a Business Service, the active FCL outlines all other metaobjects on which the Business Service is dependent, and highlighted by Fig. 5.

In the context of the warehouse pick pack process, if we consider the 'picking' Business Service, the active FCL suggests this is dependent upon Process. Review of the decomposition of the Process metaobject shows the various process steps undertaken by the Warehouse Admin. Many of these steps must be completed before the Picker can begin picking, which supports the notion that the picking Business Service's KPIs, e.g. picks per hour, could be adversely impacted by the process on which it depends.

**Fig. 5.** Activated Business Service object and attributes

#### **4.2 Current Limitations**

We are aware that our choice of semantic relations from E+ might question the external validity of the work. From our experiments, we can quantify the scale of absent semantic relations as fifty-four out of two hundred ninety-four for the selected metaobjects. However, the number of incorrectly identified semantic relations (e.g. process – delivered by – Business Service) is unknown at this time. Both issues affect the selection process, as potentially erroneous assumptions for the former and the latter are uncertain by nature. These considerations are pertinent as they influence the active vs. passive selection, which in turn impacts all pathways associated with the triple. Inclusion of a triple from all one hundred forty-seven pairs of semantic relations potentially contributed to the issues with the CG-FCA application, reflecting the combinatorial explosion.

Similarly, the inclusion of triples with identical two-way semantic relations, e.g. Application Task and Data Table, increased the complexity of the task, subsequently increasing the likelihood of errors. While we based our approach on the proposed algorithm and selecting the TDV relation in these instances based on sound logic, alternative methods may exist. The omission of all identical two-way semantic relations would provide consistency but also prevent the explication of all pathways containing those triples. The manual nature of the exercise should also be considered, especially in the case of where many cycles occur. Determining which triple is most common across cycles by eye is imprecise when reviewing such a substantial data set.

Furthermore, we chose pathways based upon our intuitive knowledge of the LEAD framework. For example, during refactoring of 08ActiveProcess.csv, three triples were deleted ('Process – produces/consumes – Data Object', 'Process – produces/consumes – Information Object', and 'Application Task – partially or fully automates – Process') based on the assumption that other pathways with more mediating metaobjects existed. This decision was based on the distance between the metaobjects in the LEAD layers and was later validated with the discovery of 'Data Object – influences the design of – Application Task – uses – Data Table – encapsulates – Information Object – specialises as – Application Function – describes the automation of – Process' in 16ActiveDataObject report.

However a more precise approach might be preferable, such as a tool that accepts an input and output metaobject in addition to all other metaobjects within the set, before returning a list of pathways in descending length order. If an algorithm comprises both logic and control, we can improve its control element [5]. The modeller acting as a 'manual' control by 1) being aware of the effect of a more significant number of triples and therefore limiting them, and 2) determining triple commonality across cycles by eye, is not optimal. As we have demonstrated, the proposed algorithm significantly assisted, thus based on our experiences, there are routes to refine it further. Therefore, the approach could be improved if the refined version complemented the CGtoFCA algorithm implemented in the CG-FCA application. Hence, the refined version duly implemented alongside CG-FCA can account for one or both these issues.

#### **4.3 Future Research**

We started with a(n) (ontology-based) metamodel, composed of concepts that were related by two-way, or bidirectional, relationships. The large majority of these bidirectional relationships seemed to be active in one direction and passive in the other. The LEAD metamodel reveals which aspects of business (the concepts) act upon or impact on others. In the context of change management (but also of the day-to-day management of a company) it is important to be able distinguish between the causes (active) and the effects (passive) of management issues (in day-to-day management) and identify the levers (active) needed to "pull" in order to realise the wanted change, while accounting for the passive effects that pulling the levers might have.

In case semantic relationships were two-way active or two-way passive, we needed to evaluate whether they could be reformulated as active-passive couples, i.e. the presently pseudo-algorithm (Fig. 2) into one that can be computerimplemented. With help from software libraries or web services that for example allow us to identify and rephrase passive and active relationships—e.g. Grammarly (www.grammarly.com) or DeepL (www.deepl.com)—the pseudoalgorithm could be automated as real executable code.

Our formal analysis of the metamodel has two main objectives. First, optimising the hands-on nature of the metamodel as a management tool: by separating the active from the passive semantics it is easier to find causes of a management issue and the levers that act upon this problem (that needs to be addressed) using the active semantics. Additionally, the passive semantics allow for identifying the effects of this management issue (and building the business case for the change). Moreover, the passive semantics will allow for identifying the (positive and negative) side-effects of the change, as the levers that are chosen or pulled will have an impact on the change goal, but also on other aspects of management that are actively affected. As such this clear "chain of command" is expected to both help identify the levers to obtain a desired change and minimise its adverse effects. Second, in ontology engineering there is an expectation that directed graphs with active and passive semantic relations should be isomorphic, i.e. a passive directed graph is the flip side of an active one. However, where they are not, there needs to be an elaboration. Is the "chain of command" thus asymmetric, and why, or are there missing concepts? As such this formal approach could be combined with OntoClean, METHONTOLOGY or other ontology engineering approaches [4,10].

#### **5 Conclusion**

We have shown that by distinguishing the active semantic relations in bidirectional (two-way directed) graphs that we can identify the dependencies in metamodels from their metaobject and semantic relation building blocks. Furthermore, we outlined how our approach provides value to industry practice, thus promoting a deeper and more widespread understanding of Layered Enterprise Architecture Development (LEAD) and the LEAD Enterprise Ontology.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Belief Update System Using an Event Model for Location of People in a Smart Home**

Marie Bernert and Fano Ramparany(B)

Orange Labs, 28 Chemin du Vieux Chene, 38240 Meylan, France ˆ fano.ramparany@orange.com

**Abstract.** Artificial Intelligence applications often require to maintain a knowledge base about the observed environment. In particular, when the current knowledge is inconsistent with new information, it has to be updated. Such inconsistency can be due to erroneous assumptions or to changes in the environment. Here we considered the second case, and develop a knowledge update algorithm based on event logic that takes into account constraints according to which the environment can evolve. These constraints take the form of events that modify the environment in a well-defined manner. The belief update triggered by a new observation is thus explained by a sequence of events. We then apply this algorithm to the problem of locating people in a smart home and show that taking into account past information and move's constraints improves location inference.

**Keywords:** Belief revision · Event logic · Semantic reasoning · Smart home · IoT

## **1 Introduction**

A smart home should provide adapted services to its inhabitants. Indeed, the users' needs strongly depend on who is present in the house, where are the people located, what they are doing, at which time of the day and which day of the week, and so on. It is thus crucial to infer this context from the data provided by the house equipment. For example, concerning the "where" part of the context, the precise location of an occupant in the house can be used, among other, to chose a device to communicate with this occupant, or to suggest activities linked to this location. However, sensors' location information are often sparse and imprecise, due to the cost of equipping a house with numerous devices, and the rejection of too intrusive devices such as cameras. As an example of an easily available but vague information, a motion detector provides the information that at least one person is present in a room. Similarly, a smartphone WiFi connection provides the information that its owner is near or in the house. In spite of this vagueness, useful information can be inferred by tracking location information over time and taking into account the house topology. More generally, in many cases a knowledge about an environment is inferred from only sparse information. However, knowing the evolution constraints of the environment and accumulating information over time can lead to a substantial knowledge about the environment, as we do in our everyday life. Our goal is to implement an algorithm that takes location information from sensors of a smart home, and infer people location from this information, taking into account constraints on moves. More generally, we propose an algorithm able to revise knowledge taking into account well defined evolution constraints.

## **2 Use Case Example**

In this section we present a use case scenario, defined as a main test case to design and test our location algorithm. In this scenario, we consider a simple house composed of four rooms and inhabited by two people: Alice and Bob. The four rooms are the entrance connected with the outside, the kitchen connected to the entrance, the livingroom also connected to the entrance and the bedroom connected to the living-room (see Fig. 1). The home is equipped with some sensors that can give us information about people location:


Given the house topology and its sensor equipment, we consider the following sevenstep scenario:


When considering only the last step of the scenario, the location devices inform us that somebody is in the kitchen, nobody is in the entrance nor the living room and Bob is somewhere in the house. We infer from this information that Alice or Bob is in the kitchen, Bob is in the kitchen or the bedroom, and Alice is in the kitchen or in the bedroom or outdoor. However, when considering all the sensor information from the beginning of the scenario together with the room adjacency constraints, one can easily infer that Alice is in the bedroom and Bob is in the kitchen. This simple example shows that it is possible to infer much more information by taking into account the house's topology and past information. This use case can be used to discriminate an algorithm that uses such a strategy from one that does not.

## **3 Related Work**

#### **3.1 Logical Formalism**

To address our problem, we need a logical formalism to deal about events and evolving facts. Many logical systems have been defined for this purpose. Here we present some of them.

**Fig. 1.** Example of a smart home, equipped with simple location devices

**Dynamic Logic.** Dynamic logic was originally developed to reason about computer program, in particular to verify their correctness. Hoare's logic constitute a well known example of programming logic [6]. It was later realized that such logic could be used for other applications and it was then generalized to dynamic logic. Meyer gives a review of different dynamic logic applications [7]. In the context of our problem, we are interested by dynamic logic used as a logic of action.

Dynamic logic of action is build on a logical language L*DL* and an action language <sup>L</sup>*ACT* . <sup>L</sup>*DL* includes a set of propositional atoms <sup>P</sup> and is closed under the usual syntactic rules. <sup>L</sup>*ACT* includes a set of atomic actions <sup>A</sup> and is closed under rules such as sequential composition of action (α; β), choice of action (α + β), arbitrary finite repetition of action (α∗), with α, β <sup>∈</sup> <sup>A</sup>. In addition to the usual syntactic rules, <sup>L</sup>*DL* is closed under the following rule: if <sup>φ</sup> ∈ L*DL* and <sup>α</sup> ∈ L*ACT* then [α]<sup>φ</sup> and α<sup>φ</sup> are in L*DL*.

An interpretation for <sup>L</sup>*DL* is a structure <sup>M</sup> of the form (S, π, r) where <sup>S</sup> is a nonempty set of states, <sup>π</sup> : <sup>S</sup> <sup>×</sup> <sup>P</sup> <sup>→</sup> BOOL is a truth assignment function that associates a truth value to each couple of state and atomic proposition, and <sup>r</sup> : L→P(<sup>S</sup> <sup>×</sup> <sup>S</sup>) <sup>a</sup> function that associates a state transition relation to each action. Given an interpretation <sup>M</sup> = (S, π, r) and a state <sup>s</sup> <sup>∈</sup> <sup>S</sup>, the truth value of a formula <sup>φ</sup> ∈ L*DL* is defined by:


Dynamic logic of action allows to write formulas such as <sup>φ</sup> <sup>→</sup> [α]ψ, meaning that if φ is true, then executing action α leads to ψ being true. Is is thus possible to describe the result of an action. However this formalism can not deal with an explicit timeline, and it is not possible, for example, to assert that an action occurred at a specific time point, or that a proposition is true during a given time interval.

**Event Logic.** In [3], Allen developed a temporal logic, based on predicates logic, to reason about actions. This logical formalism involves properties, events, and time intervals. Allen defines a set of thirteen mutually exclusive primitive relation between intervals (see Fig. 2), originally developed in [2]. This set is <sup>R</sup>*Allen* <sup>=</sup> {=, <, > , m, o, d, s, f, mi, oi, di, si, f i} where <sup>=</sup>, <sup>&</sup>lt;, <sup>&</sup>gt;, <sup>m</sup>, <sup>o</sup>, <sup>d</sup>, <sup>s</sup>, <sup>f</sup> respectively stands for "equals", "before", "after", "meets", "overlap", "during", "start", "finish", and each xi is the inverse of x. According to these relations, the predicates ST ART S(I,J), FINISHES(I,J), BEF ORE(I,J), OV ERLAP(I,J), MEETS(I,J) as well as EQUAL(I,J) are defined, for times intervals I and J. Two other predicates are defined. The first is HOLDS(p, I), where p is a property and I a time interval, meaning that property p is true during the interval I. The second predicate is OCCURS(e, I), where e is an event and I a time interval, meaning that the event e occurs during the interval I, in other words e begins at the beginning of I and ends at the end of I. Whereas a property p holding during an interval I also holds for all sub-intervals of I, an event e can not be split, and its occurrence coincide exactly with the interval I. Allen also defines some other predicates about processes, causality and actions, which we will not detail here.

**Fig. 2.** Allen's primitive relations between intervals

Following this idea of reasoning about event occurrences, Siskind developed a logic known as event logic [8]. A language L*EL* of event-logic expressions is defined as follow. A finite set O of constant symbols and a finite set E of primitive event-type symbols are given. An atomic event-logic expression is defined as a primitive eventtype symbol of arity n applied to a sequence of n constants. Finally, an event-logic expression is either an atomic event-logic expression or a compound expression: <sup>¬</sup>φ, <sup>φ</sup>∨ψ, <sup>φ</sup>∧*<sup>R</sup>* <sup>ψ</sup>, ♦*R*φ, with <sup>R</sup> ⊆ R*Allen*, and <sup>φ</sup> and <sup>ψ</sup> event-logic expressions. An eventoccurrence formula has the form <sup>φ</sup>@<sup>I</sup> where <sup>φ</sup> ∈ L*EL* and <sup>I</sup> is a time interval. An interpretation M is a function that associate each primitive event-type symbol of arity <sup>n</sup> to a subset of I × <sup>O</sup>*<sup>n</sup>*, where <sup>I</sup> is the set of all time intervals. The truth value of the formula <sup>φ</sup>@<sup>I</sup> relatively an interpretation <sup>M</sup> is define by:


It is possible to define a primitive event-type, denoted φ, from a predicate φ. The event occurrence φ@I is true if the predicate φ is true at each point of the interval I. This allows to unify the concepts of events and properties defined by Allen in [3]. As an example let's assume that we are given a set of persons, a set of rooms, and a predicate symbol IsIn. The predicate IsIn(p, r), for a person p and a room r, is true at time t if p is present in r at time t. We can define the compound event-type expression Move(p, r1, r2), for a person p and rooms r<sup>1</sup> and r<sup>2</sup> as follow: Move(p, r1, r2) = ♦{*m*}IsIn(p, r1)∧{=} ♦{*mi*}IsIn(p, r1)∧{=} <sup>¬</sup>IsIn(p, r1)∧{=} <sup>¬</sup>IsIn(p, r2). This states that a person p is moving from room r<sup>1</sup> to room r<sup>2</sup> during interval I iff p is in r<sup>1</sup> just before I, in r<sup>2</sup> just after I and that p in not in r<sup>1</sup> nor in r<sup>2</sup> during I.

#### **3.2 AGM Model**

The AGM model was developed by Alchourron, G ´ ardenfors and Makinson as a frame- ¨ work for belief revision [1]. Its main goal is to define good properties of a revision operation on a belief set. A good introduction to the AGM model is given by Ferme [ ´ 5]. Here we detail the main features of the AGM Model.

A belief set, or theory, <sup>K</sup> is a subset of a logical language <sup>L</sup> that is closed under logical consequence. Denoting Cn the consequence operation, we thus have K = Cn(K). Given a belief set <sup>K</sup> a statement <sup>x</sup>, <sup>x</sup> is either believed if <sup>x</sup> <sup>∈</sup> <sup>K</sup>, disbelieved if <sup>¬</sup><sup>x</sup> <sup>∈</sup> <sup>K</sup>, or unsettled otherwise. The purpose of belief revision is to add or retract statements from a belief set. The AGM model define the possible revision operations on a belief set and give some postulates these operations should satisfy. Given a belief set K and a statement x, three operations are possible:


Expansion can be easily defined as <sup>K</sup> <sup>+</sup> <sup>x</sup> <sup>=</sup> Cn(<sup>K</sup> ∪ {x}), effectively adding <sup>x</sup> to K without removing or adding information unnecessarily. Moreover, as in this case <sup>¬</sup>x /<sup>∈</sup> <sup>K</sup>, if Cn(x) is consistent the result is also consistent. When <sup>¬</sup><sup>x</sup> <sup>∈</sup> <sup>K</sup>, adding <sup>x</sup> to <sup>K</sup> is a revision operation. It is necessary to first remove <sup>¬</sup><sup>x</sup> from <sup>K</sup> before adding x. The revision operation can be defined using the contraction operation through Levi identity: <sup>K</sup> <sup>∗</sup> <sup>x</sup> <sup>=</sup> Cn((<sup>K</sup> − ¬x)∪ {x}). If the contraction is consistent and successful, then the revision operation is also consistent and successful.

The key of the problem is thus the contraction operation. The AGM model defines 6 main postulates a contraction operation should satisfy:


As a tool to define contraction, we denote <sup>K</sup> <sup>⊥</sup> <sup>x</sup> the set of all maximal subset of <sup>K</sup> that does not imply x. A first naive approach to define contraction, called maxichoice contraction, is to chose <sup>K</sup> <sup>−</sup> <sup>x</sup> to be one element of <sup>K</sup> <sup>⊥</sup> <sup>x</sup>. The maxichoice contraction has some disconcerting properties. In particular, when defining revision through the Levi identity, <sup>K</sup> <sup>∗</sup><sup>x</sup> is always complete, which means that no statement is unsettled. Thus, the belief set generated by maxichoice contraction and revision might be considered "too big". A second approach, called meet contraction, is to define <sup>K</sup> <sup>−</sup> <sup>x</sup> as the intersection of all elements of <sup>K</sup> <sup>⊥</sup> <sup>x</sup>. In this case, on the contrary, the result might be considered "too small". Indeed we have <sup>K</sup> <sup>−</sup> <sup>x</sup> <sup>=</sup> <sup>K</sup> <sup>∩</sup> Cn(x) and <sup>K</sup> <sup>∗</sup> <sup>x</sup> <sup>=</sup> Cn(x). In between, contraction can be defined as a partial meet contraction, which consist in selecting the most important elements of <sup>K</sup> <sup>⊥</sup> <sup>x</sup>. Let <sup>γ</sup> be a selection function such that <sup>γ</sup>(<sup>K</sup> <sup>⊥</sup> <sup>x</sup>) is a non-empty subset of <sup>K</sup> <sup>⊥</sup> <sup>x</sup>. Partial meet contraction is defined as <sup>K</sup> <sup>−</sup> <sup>x</sup> <sup>=</sup> <sup>γ</sup>(<sup>K</sup> <sup>⊥</sup> <sup>x</sup>) and partial meet revision is defined through the Levi identity. Maxichoice contraction and meet contraction are extreme cases of partial meet contraction, where <sup>γ</sup> selects respectively one element or all elements of <sup>K</sup> <sup>⊥</sup> <sup>x</sup>. It can be shown that a contraction operation satisfies the 6 postulates if and only if it is a partial meet contraction. There is thus no general way to define contraction (and revision). Contraction requires to make some choice about the interesting beliefs to be preserved.

The AGM is a general framework for belief revision, that gives properties a revision operation should satisfy. However it does not detail practical implementation of these operation. In particular the contraction operation is not trivial to define.

#### **3.3 Truth Maintenance Systems**

Truth maintenance systems (TMS) were introduced by Doyle in [4]. As for most belief revision systems, Doyle's TMS tackles the problem of revising a belief set when a new information brings a contradiction. The two main principles of the TMS is to use a nonmonotonic logic, where some facts are believed unless proved false, and to keep track of reasons why a fact is believed.

The TMS works in duality with a problem solver, which provides statements and justifications for these statements. The goal of the TMS is to decide which statement should be believed or not depending on their justifications. Within the TMS, statements are represented by node that are said to be "in" if the statement is believed or "out" otherwise. One node is marked as a contradiction and should not be "in". A justification for a node consists in two parts: a in-list and an out-list. A justification makes a node "in" iff all nodes in the in-list are "in" and all nodes in the out-list are "out". The outlist constitutes the non-monotonic part of the TMS. For example, in natural language, a justification for "Titi can fly" can be: "If Titi is a bird, Titi can fly, unless it is a penguin". In the TMS formalism, the node "Titi can fly" has a justification with the in-list "Titi is a bird" and the out-list "Titi is a penguin". There are particular of node in the TMS,called assumptions, which are nodes justified by an out-list containing their negation. An update of the TMS is triggered when a new justification is added. When the contradiction node becomes "in" after an update, a backtracking procedure is called to make the contradiction "out" again. This is done by finding the assumptions that justify (possibly indirectly) the contradiction and making one of these assumption "out" by adding a justification.

The TMS approach explains a contradiction by the fact that some assumptions were made that are not true. The contradiction is solved by disbelieving these assumptions. In our case, a contradiction can arise if a fact that was previously true becomes false because the environment is evolving. This difference makes the TMS not suitable for our problem.

## **4 Our Contribution**

#### **4.1 Algorithm Overview**

Our algorithm assumes that we are provided information from sensors, that holds during a time interval. The time line is divided into time intervals, each time interval corresponding to a set of observations that holds during the entire interval. It is also possible to have intervals during which no information is given. In addition to these observations about the environment we are given a set of events that can make the environment evolve. We assume that these events modify our knowledge in a relatively simple way, such that, given a belief set holding before the event, we know what new belief set holds after an event occurs. The goal is to infer facts about the current environment from the consecutive observations and the possible event sequences explaining these observations.

The principle of the algorithm is to explore all possibilities of event sequences compatible with the past and current observations. Possibilities are explored by examining the consequences of adding an event to a sequence that has already been considered. The added event should be compatible with the current observations and what had already been inferred from the previous sequence hypothesis. Each time the observations change, new possibilities can be explored. Once all possibilities have been explored, we can infer that a fact about the environment is true if it is true considering every event sequence hypothesis, or possible if it is possible for at least one sequence hypothesis.

#### **4.2 Logical Formalism**

For the purpose of our problem, we found it practical to reason about continuous time and punctual events (events occurring at a precise time point). Indeed, for simplicity we assume that properties, such as the room position of a person in the house, are discrete and always well defined. As a consequence, changes on properties, such as the move of a person from one room to an adjacent room, are punctual events. Event logic provides useful operators to reason about event occurrences over time. However, for more flexibility, we chose to define a logic based on predicate logic as it was done in [3].

The main idea is to take a classical logic, later called the base logic, and augment it with time and events to construct a dynamic logic. We simply assume that the base logic contains the conjunction and the disjunction. Formulas from the base logic will be later called properties. A finite set E of punctual event symbols is given. A interpretation for our punctual event logic consists in two main elements:


We define two main predicates to write event logic formulas:


We also define some other useful derived predicates:


This formalism gives us a framework to design our algorithm.

#### **4.3 Transition Graph Structure**

Let assume that we made a series of observations O0, ..., O*<sup>n</sup>* during consecutive intervals I0, ..., I*n*. Observations O0, ..., O*<sup>n</sup>* are sets of properties. We can thus write for each k: Holds(O*k*, I*k*). The intervals I*<sup>k</sup>* are called observation intervals. Note that the observation intervals do not necessarily coincide with the intervals during which the base logic model does not change. The observations can change without an event occurring and an event can occur without inducing a change in the observations. We assume that the transition model is known. However the event occurrence succession is unknown. Our goal is to infer properties given the observations and the transition model.

Let focus on one particular time point t in an interval I*k*. The main idea of our algorithm is to make hypotheses about the event sequences that occurred from the starting time point t<sup>0</sup> until t. For this purpose, we associate each event sequence s to its transition function T*s*, which is the composition of the transition functions associated to each of its events. Given a transition function T, we denote Seq(T) the set of event sequences s such that T*<sup>s</sup>* = T. Given a time point t, an observation interval I*<sup>k</sup>* and a transition function T, we consider, as an event sequence hypothesis, the formula, denoted N*<sup>k</sup> <sup>T</sup>* (t), stating that t is in I*<sup>k</sup>* and that the event sequence between t<sup>0</sup> and t belong to Seq(T):

$$N\_T^k(t) = t \in I\_k \land \exists s \in Seq(T), OccursSeq(s, [t\_0; t]) \tag{1}$$

The formula N*<sup>k</sup> <sup>T</sup>* (t), will be later called a belief node, as we will build a graph structure on these hypotheses. Let first notice that, for k given, the disjunction of the N*<sup>k</sup> <sup>T</sup>* (t) for all transition function T is simply the statement that t belongs to I*k*. For a given k, the observation interval I*<sup>k</sup>* can be thus associated with the set of belief nodes N*<sup>k</sup> <sup>T</sup>* (t) with T ranging over all possible transition function.

Let us now build a graph structure on belief nodes. For this purpose, we build an equivalent formula for N*<sup>k</sup> <sup>T</sup>* (t) using predecessor belief nodes. Given a transition function T we consider P red(T) the set of couple (T , e), with T a transition function and <sup>e</sup> an event symbol, such that <sup>T</sup> <sup>=</sup> <sup>T</sup>*<sup>e</sup>* ◦ <sup>T</sup> . The hypothesis N*<sup>k</sup> <sup>T</sup>* (t) is true if and only if one of the following hypothesis is true:


Thus, each belief node can be written as a disjunction of hypotheses involving other belief nodes, which are predecessor belief nodes through different events. A predecessor belong to the same observation interval when the last event occured in the this interval, or to the previous observation interval when the last event occured at the time point between the two intervals (in this case the transition can correspond to no event).

The belief nodes can thus be organized into a graph, which we call transition graph, where vertices are belief nodes and edges correspond to events (see Fig. 3). The edges have two different types: internal edges, linking nodes corresponding to the same observation interval, and external edges, linking nodes from two consecutive observation intervals. We thus label the edges with transition symbols constructed from the event symbols and taking into account the internal or external nature of the edge. We denote this set of transition symbol <sup>E</sup>*tr* <sup>=</sup> <sup>E</sup>*in* <sup>∪</sup> <sup>E</sup>*ex* with <sup>E</sup>*in* <sup>=</sup> {(e, in), e <sup>∈</sup> <sup>E</sup>} the set of internal transition symbols and {(e, ex), e <sup>∈</sup> <sup>E</sup> ∪ {idle}} the set of external transition symbols. The successor of a node N*<sup>k</sup> <sup>T</sup>* through an edge labeled by <sup>e</sup> <sup>∈</sup> <sup>E</sup>*tr* can be easily computed as succ(N*<sup>k</sup> <sup>T</sup>* , e) = N*<sup>k</sup> <sup>T</sup>e*◦*<sup>T</sup>* if <sup>e</sup> <sup>∈</sup> <sup>E</sup>*in* and succ(N*<sup>k</sup> <sup>T</sup>* , e) = N*<sup>k</sup>*+1 *<sup>T</sup>e*◦*<sup>T</sup>* if <sup>e</sup> <sup>∈</sup> <sup>E</sup>*ex*.

A walk in the transition graph starting from the initial node N<sup>0</sup> *Id* gives a sequence of events for which we know the position relatively to the observation intervals. The definition of the hypothesis N*<sup>k</sup> <sup>T</sup>* (t) can be refined by stating that their exist some walk w from N<sup>0</sup> *Id* to N*<sup>k</sup> <sup>T</sup>* (t) such that the events occurring between t<sup>0</sup> and t correspond to the event sequence described by w with the correct position in the observation intervals.

**Fig. 3.** Example of transition graph structure

#### **4.4 Nodes' Belief Sets**

A belief set is a set of formulas on a logical language, closed upon logical consequence. For convenience we will use belief set within logical formulas. In such cases, the belief set can be seen as the conjunction of all its elements. Similarly we sometimes define the value of a belief set through a logical formula, implicitly meaning that the belief set is the set of consequences from this formula. A belief set can also be seen as a set of interpretations, corresponding to the interpretations upon which all its formulas are true. From this point of view the conjunction (resp. disjunction) of two belief sets is the intersection (resp. union) of the corresponding sets of interpretations. In our logical formalism, a transition function can be applied to a base logic belief set, using the correspondence with set of interpretations. Notice that any transition function preserves the conjunction and the disjunction, and is monotonic relatively to the implication.

To each belief node N*<sup>k</sup> <sup>T</sup>* we associate a base logic belief set B*<sup>k</sup> <sup>T</sup>* containing all properties that can be inferred at a time point t upon the N*<sup>k</sup> <sup>T</sup>* (t) hypothesis and given the past observations. In other words, we want to find B*<sup>k</sup> <sup>T</sup>* such that, for <sup>t</sup> <sup>∈</sup> <sup>I</sup>*k*:

$$N\_T^k(t) \land \bigwedge\_{k'=0}^k Holds(O\_{k'}, I\_{k'}) \to Holds\_{\{m\}}(B\_T^k, \{t\})\tag{2}$$

Let w be a walk from the initial node N<sup>0</sup> *Id* to a node N*<sup>k</sup> <sup>T</sup>* . We associate w to a belief set B(w) such that if the sequence of events described by w occurred between t<sup>0</sup> and the current time point t, and the taking into account the past observations O0, ..., O*k*, then we have Holds{*f*}(B(w), {t}). This belief set can be defined the following way:

$$B(w) = \begin{cases} O\_0 & \text{if } p \text{ is empty} \\ T\_e(B(w')) \wedge O\_k \text{ with } w = (w', e) \text{and } k = n\_{ex}(w) \end{cases} \tag{3}$$

where <sup>n</sup>*ex*(w) is the number of external event transitions in <sup>w</sup>. Using Hold{*m*}(B, t)<sup>∧</sup> Occurs(e, t) <sup>→</sup> Holds{*mi*}(T*e*(B), t) as well as Hold{*s*}(B, I)∧OccursSeq(s, I) <sup>→</sup> Holds{*f*}(T*s*(B), I), it can be shown that <sup>B</sup>(w) defined this way satisfies the desired property.

As the hypothesis N*<sup>k</sup> <sup>T</sup>* (t) states that their exist some walk w from N<sup>0</sup> *Id* to N*<sup>k</sup> <sup>T</sup>* (t) such that the corresponding sequence occurred, the belief set B*<sup>k</sup> <sup>T</sup>* can be defined as follow to satisfy Eq. 2:

$$B\_T^k = \bigvee\_{w \in walk(N\_{Id}^0, N\_T^k)} B(w) \tag{4}$$

where walk(N<sup>0</sup> *Id*, N*<sup>k</sup> <sup>T</sup>* ) is the set of walk from the initial node to N*<sup>k</sup> <sup>T</sup>* in the transition graph.

The goal of the algorithm it to compute the belief set B*<sup>k</sup> <sup>T</sup>* recursively.

#### **4.5 Building the Graph**

The goal of the algorithm is to build the transition graph and compute the nodes' belief sets recursively to match Eq. 4 so that Eq. 2 is satisfied for all belief nodes. As an input, the algorithm is provided, one after the other, the observations associated to each observation interval. Each time the observation associated to the next interval is received, the algorithm update the graph to compute the nodes associated to this interval.

The following notations are used to describe the algorithm:


All nodes' belief sets are initialized to be inconsistent. When the observations associated to the first observation interval is given, the U pdateInitialInterval function is called (see algorithm 1). Then, each time the observations associated to the next observation interval is received, the U pdateNextInterval function is called (see algorithm 2). These two function ensure that the I*current* variable refer to the last observation interval for which information has been received, and that the belief sets of nodes associated to this interval (and all previous intervals) are correctly computed. These two function both call the recursive function U pdateNode (see algorithm 3), which performs a deep first exploration of the graph, updating the node's belief set when necessary.

Each call of the recursive function U pdateNode corresponds to a walk in the graph. We define the set W of explored walks as the maximal set of walks such that for each node N, the associated belief set is the disjunction of all B(w) with <sup>w</sup> <sup>∈</sup> walk(N<sup>0</sup> *Id*, N) <sup>∩</sup> <sup>W</sup>. For short, we note <sup>W</sup>*<sup>N</sup>* <sup>=</sup> walk(N<sup>0</sup> *Id*, N) <sup>∩</sup> <sup>W</sup>. The algorithm is correct if at the end of all recursive calls, W contains all walks from the initial node to the current observation interval. During the algorithm execution, the following property on W is maintained: if W contains a walk w which is not in the call stack, W also contains all walks starting with w. In particular, when the algorithm terminates, the call stack is empty and, as W contains all walks in the previous interval (or the empty walk), W also contains all walks in the current interval. To maintain this property, the U pdateNode function ensures that if W*<sup>N</sup>pred* contains a walk w when it is called, then at the end of the execution, W contains all walks beginning with (w, e). In the recursive case, the node's belief set is updated so that W*<sup>N</sup>* contains (w, e). As the function is then called recursively for all event transitions e , W should contains all walks starting with (w, e, e ) for all e , and thus all walks starting with (w, e). In the base case, the belief set update has no effect, which means (w, e) is already in W*<sup>N</sup>* while not in the call stack, and thus all walks beginning with (w, e) are already in W.

A key result for the termination of the algorithm is that cycles in a walk w do not impact the computation of B(w). Indeed if w = w w*<sup>c</sup>* with w*<sup>c</sup>* a cycle, it can be shown, using the monotonicity of transition functions, that <sup>B</sup>(w) <sup>→</sup> <sup>B</sup>(w ). When the U pdateNode function is called for a walk w = w w*<sup>c</sup>* with w*<sup>c</sup>* a cycle, the corresponding node N has already been updated so that w is in W*<sup>N</sup>* . Assuming that w*<sup>c</sup>* is the only cycle in w, the condition on the structure of W ensure for all walks w in <sup>W</sup>*<sup>N</sup>pred* , (w, e) is already in <sup>W</sup>*<sup>N</sup>* , except for <sup>w</sup>. As <sup>B</sup>(w) <sup>→</sup> <sup>B</sup>(w ), w is also already in W*<sup>N</sup>* . The update has thus no effect on the belief set and the function returns. As a consequence walks with cycles are never effectively explored, which ensure that the algorithm terminates, as long as the number of consistent nodes is finite.


*Icurrent* := *I*<sup>0</sup> *B*(*N*<sup>0</sup> *Id*) := *O*<sup>0</sup> **for** *<sup>e</sup>* <sup>∈</sup> *<sup>E</sup>in* **do** *U pdateNode*(*succ*(*N*<sup>0</sup> *Id*)*, N*<sup>0</sup> *Id, e*) **end for**

## **Algorithm 2.** U pdateNextInterval()

**for** *<sup>N</sup>* <sup>∈</sup> *<sup>I</sup>current* **do for** *<sup>e</sup>* <sup>∈</sup> *<sup>E</sup>ext* **do** *U pdateNode*(*succ*(*N,e*)*,N,e*) **end for end for** *Icurrent* := *next*(*Icurrent*)

#### **Algorithm 3.** U pdateNode(N,N*pred*, e)

*Bold* := *B*(*N*) *<sup>B</sup>*(*N*) := *<sup>B</sup>*(*N*) <sup>∨</sup> (*Te*(*B*(*Npred*)) <sup>∧</sup> *Obs*(*N*)) **if** *<sup>B</sup>*(*N*) <sup>=</sup> *<sup>B</sup>old* **then for** *<sup>e</sup>* <sup>∈</sup> *<sup>E</sup>in* **do** *U pdateNode*(*succ*(*N,e*)*,N,e*) **end for end if**

#### **4.6 Querying the Graph**

Once the transition graph is constructed, we want to know, given a time point t, which properties are true at this time point. By construction, during an observation interval, the disjunction of all its belief nodes holds. Thus, a property is true at a time point within the observation interval if it is true in all belief nodes. Additionally, a property is possible (i.e. not false) if it is possible in at least one belief node. One can also get interested in what happen at the beginning (resp. at the end) of the observation interval, by looking only at the belief nodes that have consistent predecessors (resp. successors) in the previous (resp. next) interval. For example, a property is true at the beginning of the observation interval if it is true in all nodes that have a consistent predecessor in the previous interval. Knowing which properties are true at the beginning, during or at the end of each observation interval, we can infer if a formula of the form Holds*R*(φ, I) is true, false or unknown according to current knowledge. Moreover possible event sequences from t<sup>0</sup> to a time point t in I*<sup>k</sup>* correspond to walks in the graph from the initial node N<sup>0</sup> *Id* to a node associated to I*k*, going through only consistent nodes.

#### **5 Application to the Location Problem**

We will now apply this algorithm to the home location problem. Here we assume that the house topology is known, and that a set devices provides two type of location information over time: information about the number of people present in one room, and information about the location of a specific person. We also assume that only known people are present in the house. We thus have a set of person P, a set of rooms <sup>R</sup> and an adjacency relation Adj <sup>⊆</sup> <sup>R</sup> <sup>×</sup> <sup>R</sup>. The property language is build with one predicate: IsIn(p, r) with <sup>p</sup> <sup>∈</sup> <sup>P</sup> and <sup>r</sup> <sup>∈</sup> <sup>R</sup>. An event is a person moving from one room to an adjacent room. The set of event symbols is defined as: <sup>E</sup> <sup>=</sup> {Move(p, r1, r2), p <sup>∈</sup> P,(r1, r2) <sup>∈</sup> Adj}. A transition function corresponds to a subset of person moving each from one room to another room. For convenience, as they lead to similar belief sets, we chose to group in the same belief node all transition functions for which people arrive in the same position, not taking into account their initial position. We consider a belief set as a disjunction of house states, where a house state is the conjunction of predicates of the form IsIn(p, r), for all <sup>p</sup> <sup>∈</sup> <sup>P</sup>. A house state thus describes the position of all people in the house, and can be seen as an interpretation for the base logic. Here for convenience, we use a different language for observations. An observation can be whether the predicate Count(r, N), with <sup>r</sup> <sup>∈</sup> <sup>R</sup> and <sup>N</sup> <sup>⊆</sup> <sup>N</sup>, meaning that the number of persons in room r is in N, or the predicate Located(p, R ), with <sup>p</sup> <sup>∈</sup> <sup>P</sup> and <sup>R</sup> <sup>⊆</sup> <sup>R</sup>, meaning that the person <sup>p</sup> is in one of the room in <sup>R</sup> . Adding an observation to a belief set can be simply done by removing the incompatible house states from the disjunction.

We applied this algorithm to the use case described in Sect. 2. We have <sup>P</sup> <sup>=</sup> {A, B} for Alice and Bob, and <sup>R</sup> <sup>=</sup> {o, e, k, l, b} for outdoor, entrance, kitchen, living-room and bedroom. The sensors deliver information about Bob's location, and the number of person in the entrance, the living-room and the kitchen. We denote out <sup>=</sup> {o},


**Table 1.** Results of the algorithm applied to Alice and Bob use case

home <sup>=</sup> {e, k, l, b}, zero <sup>=</sup> {0} and some <sup>=</sup> <sup>N</sup>∗. The scenario is composed of seven steps, corresponding to observation intervals. The transition graph resulting from the algorithm is described in Table 1. Notice that in the last interval, the only possibility is that Alice is in the bedroom and Bob in the kitchen, which is more precise that what can be inferred using only the last observations. The implemented algorithm thus successfully worked on the defined test case.

#### **6 Conclusion and Perspectives**

At the application level, our work has shown that it is possible to infer accurate location information with a minimum of sparse low level measurements. For instance, as proved by our illustrative example, our approach makes it possible to find out which rooms several known occupants of the home can be located in, even if only few of them can be identified through their mobile phone or RFiD card and only very low level sensors and detector are used, only some rooms of the house are instrumented. The formalism and logical framework that we have defined multiple levels of genericity. In the Internet of Thing (IoT) domain, we can apply a similar approach to identify the status of an equipment (device, system, machine) through sparse observations of the equipment and of its environment.

On a more general level, we believe that our approach, including the modeling technique and algorithms can be applied to range of application domains. Characteristics of the target domains include the fact that information in these domains are organized as interrelated chunks of data and that it is known how modifying one chunk can affect chunks that are related to the chunk being modified.

On future work, this approach could also be extended to include probabilistic reasoning. This would allow to tackle the problem of imperfect sensors that can occasionally provide erroneous information, or to take into account the fact events and situations may occur with different probabilities.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Natural Language Generation Technique for Automated Psychotherapy**

Graham Mann(B) , Beena Kishore , and Pyara Dhillon

Murdoch University, 90 South Street Murdoch, Perth, WA 6150, Australia {g.mann,b.kishore,p.dhillon}@murdoch.edu.au

**Abstract.** The need for software applications that can assist with mental disorders has never been greater. Individuals suffering from mental illnesses often avoid consultation with a psychotherapist, because they do not realize the need, or because they cannot or will not face the social and economic consequences, which can be severe. Between ideal treatment by a human therapist and self-help websites lies the possibility of a helpful interaction with a language-using computer. A model of empathic response planning for sentence generation in a forthcoming automated psychotherapist is described here. The model combines emotional state tracking, contextual information from the patient's history and continuously updated therapeutic goals to form suitable conceptual graphs that may then be realized as suitable textual sentences.

**Keywords:** Natural language generation · Conceptual graphs · Model-based reasoning

## **1 Introduction**

Many parts of the world now face a serious mental health care treatment gap, especially in low to middle income countries, and non-urban areas in high income countries [1]. The reasons are complex, but much of the shortage is caused by a lack of available skilled psychiatric professionals, and a failure of engagement by patients for economic or social stigma reasons [2]. A review of evidence shows that there are good reasons to think computerized therapy may be one effective approach to overcoming these difficulties [3]. While we do not imagine that these would be equivalent to consultation with skilled human psychiatrists, even existing mental health care apps can play a role and would often be better than nothing. In the case of "talking" therapies – those relying primarily on psychiatric interviews - software can today carry out natural conversations with a patient, simulating the role of the therapist. This paper deals with the formation and expression of appropriate responses to be used by an automated therapist during a consultation. It is a conceptual graph (CG) based language theory realized as a computer model of language generation called Affect-Based Language Generation (ABLG).

Current trends in conversational systems tend to favour machine learning (ML) approaches, typically employing neural networks (NN), but we believe that these are not ideal in this application, for the following reasons. First, the knowledge and executable skills of a machine learning system are typically opaque, lack auditability and so lack trust [4]. This is a serious drawback in medical applications. Knowledge and skills in conceptual graph (CG) based systems are as a rule much more human-readable and subject to logical reasoning that can readily be comprehended and verified. Second, NNbased or statistical ML approaches (with the possible exception of Bayesian learners) cannot easily incorporate high level, *a priori* knowledge into their processing [5]. This disadvantages learners in domains where such high-level knowledge is available or must be policy. But by virtue of their standardized knowledge representation, CG systems can freely mix prior knowledge incoming data relatively easily. Third, ML language systems are typically very data-hungry, and while large corpuses of language knowledge are now available, using these is computationally expensive. By contrast, model-based CG systems can, with some labour, be made to work with a relatively small amount of domain-specific language knowledge and with little or no learning.

In the rest of this paper, Sect. 2 proposes a system model that draws on tracked emotional states, patient's utterances and background information about the patient with pragmatic cues and goals from a control executive to generate a suitable response in conceptual form. Section 3 briefly describes our experimental implementation, consisting of heuristics to fetch instances of the above informative content, and calling on conceptual functions to filter these and bring them together to form CGs that can be realised as linear texts. The whole process is controlled by an executive expert system implementing psychotherapeutic rules. Finally, Sect. 4 concludes with some current challenges of this approach and its prospects for testing and further development.

## **2 Sources Informing the Generation of Responses**

Sentence generation involves the planning of conceptual content first, and then linguistically encoding it into a grammatical string of words [6]. Our idea of generating sentences is based on a therapeutic process informed by representations of the patient's current emotional state, representations of their pre-clinical interview history, and representations of their on-going utterances.

#### **2.1 Tracking of Patient's Expressed Emotions**

It is difficult to imagine a successful psychotherapist who is not concerned with the emotional state of the patient. Even behaviourist therapies that emphasise overt actions in response to stimuli over mental state today include emotions as a recognised behavioural response, if not an important internal state determining them [e.g., 7]. The evidence is clear that the patient's emotional state which is important for treatment needs to be closely monitored [8]. This state must be dealt with properly to maintain patients in a comfortable place, while at the same time empathizing, noting the significance of the emotion and helping the patient to find meaning from it. Much emotional information can be obtained by monitoring a speaker's tone of voice, facial expression or other body language. Today's mobile devices, with their microphones and cameras could hope to read these forms of expression, but since at this stage our work is about testing a theory of natural language generation, not a practical app, we use only text.

According to the survey conducted by Calvo and D'Mello [9] on models of affect, early approaches to detect emotional words in text include lexical analysis of the text to recognize words that are immanent of the affective states [10] or specific semantic analyses of the text based on an affect model [11]. The current work adapts Smith & Ellsworth's six-dimensional model [12] to make a system that can better grasp the subtleties of patient affect. Their chosen modal values on the principle component states for 15 distinguished emotional states are shown in Table 1.


**Table 1.** Mean locations of labelled emotional points in the range [− 1.5, +1.5] as compiled in Smith & Ellsworth's study.

A patient's textual utterance is compared to accumulated word-bags that offer clues to the expressed emotions, plus a filter to exclude references to the emotions of others. These classify the expressed emotion into one of the Smith & Ellsworth's 15 ideal values, the vectors of which locate the expression as a single point in a six-dimensional affective space. This allows mappings of complex emotional states into a consistent hypervolume so that, for example, the "distances" between two states can be computed. It also allows emotive subspaces to be defined. One way that emotional tracking can be used is for the appropriate application of sympathy. We define a "safe region" in the affective space. The therapist may continue the therapy as long as the patient's tracked emotional state stays within the safe region. A single point was chosen as the "most distressed" emotional state (we used {1.10 1.3 1.15 1.0–1.15 2.0}). The simplest model of a safe region is outside a hypersphere of fixed radius centred on this point. The process is then reduced to finding the Euclidian distance between the current emotional state and the above-defined distressed centre.

$$
\Delta\Omega = \sqrt{(\mathcal{P}\mathbf{i} - \mathcal{P}\mathbf{j})^2 + (E\mathbf{i} - E\mathbf{j})^2 + (C\mathbf{i} - C\mathbf{j})^2 + (A\mathbf{i} - A\mathbf{j})^2 + (R\mathbf{i} - R\mathbf{j})^2 + (O\mathbf{i} - O\mathbf{j})^2}
$$

If the calculated distance is greater than an arbitrarily defined tolerance threshold (radius), the patient's current emotional state is considered safe. The calculated of an emotional state {1.15 0.09 1.3 0.15 −0.33 −0.21} from the above-defined distress point would be 1.70. For an arbitrary tolerance radius of 2.5 units from the distress point, the patient's tracked emotive state would not be in the safe region. A more sophisticated approach would be to map examples of real patient distress into a convex volume of the emotional space and then measure the current tracked emotional state to the nearest point on that volume.

#### **2.2 Conceptual Analysis of Patient's Utterances**

Study of a reference corpus of 118 talking therapy interviews [13], reveals that these patient utterances can be long and rambling, often incoherent and quite difficult for a person, much less a machine, to comprehend. While we have a conceptual parser, SAVVY, capable of converting real, non-grammatical paragraphs into meaning-preserving CGs [14], it was not developed for use in this domain. For the present work we do not intend to improve it to the point of creating meaningful conceptual representations for most of the utterances observed in our corpus. Conceptual parsers depend on an ontology in the form of a hierarchy of concepts, a set of relations and a set of actors. Manually creating representations of all the terms used in those interviews for SAVVY would be a difficult and time-consuming task. (This most serious of drawbacks for conceptual knowledgebased systems is now being addressed in automated ontology-building machines [e.g. 15]). Our focus in this study is the *generation* of language. Yet this kind of psychotherapy is essentially conversational, so we must allow the conceptual representations of patient utterances to be an input even to test response formation. Therefore, SAVVY will be adapted to accept selected patient utterances of interest. In some cases, to keep the project manageable, we hand-write plausible input CGs to avoid diverting too much time and energy away from our generation pipeline.

#### **2.3 Using Context to Inform the Planning Process**

In regular clinical practice, the first step for a new patient is an admitting (or triage) interview, that can capture important biographical details, a presenting complaint, background histories, and perhaps an initial diagnosis. Because we wish our model of language generation to account for existing, contextual information, we will not actively model this initial interview, but rather only subsequent interviews that have access to this previously gathered background. A set of background topics that should be sought during an admitting interview is described by Morrison [16]. Our current model draws 12 topics from this source and adds three extra topics specific to our clinical model.

#### **2.4 Executive Control**

An executive system based on a theory about how therapy should be done is needed for overall control. At each conversational turn, the executive should recommend the best "pragmatic move" and therapeutic goal for the response. This allows for the selection and instantiation of appropriate high-level conceptual templates that form the therapist's utterances to support, guide, query, inform or sympathize with the patient as appropriate during the treatment process. Our executive is based on the brief therapy of Hoyt [17] and the solution-based therapy of Shoham et al. [18]. As recommended by Hoyt, the focus is on negotiating treatment practices, not diagnostic classification. However, in this experiment a working diagnosis might become available as a result of the therapy or be input as background knowledge.

For a natural interviewing style, the executive must allow its goal-seeking behaviour to be interrupted by certain imperatives imposed by conversational conventions and good clinical practice. If the patient asks a question, this deserves some kind of answer. If the patient wishes to express some attitude or feeling about some point, that should usually be entertained immediately. If the patient's estimated emotional state falls into distress, it is important that the treatment model is suspended until the patient can be comforted and settled. Similarly, if rapport with the patient is lost (the quality of the patient's responses deteriorates), special steps must be taken to recover this before anything else can be done. We call these *forced* responses, to distinguish them from less obligatory pragmatic moves, which in our model are driven by key goals in the therapy.

In most cases, a conceptual structure representing a suitable therapist's response can be formed by unifying pragmatically selected schemata with content-bearing information from the other sources. This process is to be handled by heuristic rules that must be sufficiently general to keep the number needed as low as possible. In a few cases, a single standardized expressive form can be accessed without the need for unification.

#### **2.5 Response Generation Architecture**

The proposed architecture of the ABLG system relies on three principle processes (Fig. 1): Preparing input for Therapeutic Expert, the Therapeutic Expert System, and the Surface Realization System. Based on the input sources, heuristic tests set the values of key variables controlling the behaviour of the Therapeutic Expert, such as patient type, clarity of the patient's chief complaint, the patient's readiness to change, their current emotional state, and their rapport with the therapist. At each conversational turn, the expert system recommends the best pragmatic move to the Surface Realisation System. This in turn chooses a feature structure template based on the pragmatic move recommended by the expert system. The template slot filler will fill in the template with relevant content, drawn from CG representation of the patient's recent utterances, or looked up from the background database. Lastly, YAG (Yet Another Generator) [19] realization library will convert the feature structure into a grammatically correct sentence for output. In some instances the Therapeutic Expert System will recommend a canned response, which can be directly output without using the Surface Realisation System.

**Fig. 1.** Architecture of Affect Based Language Generation (ABLG) system

## **3 Implementation Details**

To track emotions, we are experimenting with computationally "cheap" heuristics (meaning that, relative to machine learning approaches, logical rules on CGs do not consume very many CPU cycles). that can distinguish the patient's current emotional states directly from the text, though this has the disadvantage that it does not model cognitive aspects of emotion. To bring patient's conversational utterances into the picture, a text-to-CG parser is required. But even if it was feasible to construct complete representations for every utterance performed by a patient, this would not be desirable, because from analysis of the corpus, surprisingly few such representations would actually have useful implications for treatment, at least within our simplified model. Our conceptual parser, SAVVY, can do this because it assembles composite CGs out of prepared conceptual components that are already pre-selected for the domain of use to which they will be put.

A simple database currently provides background knowledge for our experiments. Each entry in the knowledgebase is a history list of zero or more CGs, indexed by both a patient identifier and one of the 15 background topics (Sect. 2.3) such as suicide\_attempts, willingness\_to\_change and chief\_complaint. Entries may be added, deleted or modified during processing, so the database can be used as a working memory to update and maintain therapeutic reasoning over sessions. Initially these entries are provided manually to represent information from the pre-existing admitting interview.

Psychiatric expertise is represented by a clinical Expert System Therapist, based on TMYCIN [20]. Consultation of the system is performed at each conversational turn, informed by the current state of variables from the inputs. Backward-chaining inference maintains internal state variables and recommends the best "pragmatic move" and "therapeutic goal". These parameters allow for the selection and instantiation of appropriate high-level templates that, when elaborated, are linearized into output texts. Further implementation details can be found in [21].

#### **4 Conclusion**

This generation component is still in development, so no systematic evaluation has yet been conducted. Some components have been coded and unit tested. Getting the heuristics of the system to interact smoothly with each other is a challenge; that is to be expected in this modelling approach. We are concerned about the number of templates that may be required, particularly at the surface expression level. If they become too difficult or too many to create, the method might become infeasible. The heuristic tests are not difficult to write, but are, of course, imperfect. Also, we have not fully tested the emotion tracking on many real patient texts so far.

Our planned evaluation has two parts. First, a systematic "glass-box" analysis will discover the strengths and limitations of the generation component, particularly with respect the generality of the techniques. Second, the "suitability", "naturalness" and "empathy" of the response generation for human use will be tested, using a series of ersatz patient interviews (to avoid the ethical complications of testing on real patients). Human judges (students in training to be psychotherapists) will be provided with background information and example patient utterances as well as the actual responses generated by the system. The judges will then rate these transcripts on those variables using their own knowledge of therapy. Finally, we reiterate that if hand-built conceptual representations can be practically built up using existing methods, the effort will be worthwhile if the systems are then more transparent and auditable than NN or statistical ML system and thus, more trustworthy.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Creative Composition Problem: A Knowledge Graph Logical-Based AI Construction and Optimization Solution Applied in Cecilia: An Architecture of a Digital Companion Artificial Intelligence (AI) Agent System Composer of Dialogue Scripts for Well-Being and Mental Health**

Mauricio Javier Osorio Galindo1(B) and Luis Angel Montiel Moreno2

<sup>1</sup> Universidad de las Americas Puebla, Ex Hacienda Sta. Catarina M ´ artir S/N, ´ San Andres Cholula, Puebla 72810, Mexico ´ mauricioj.osorio@udlap.mx <sup>2</sup> 71 pte 1505b Puebla, Puebla 72450, Mexico https://dblp.org/pid/o/MauricioOsorio.html

**Abstract.** Contribution of this work is to Define the Creative Composition Problem (CCP) for Human Well-being Optimization by Construction of Knowledge Graph using Knowledge Representation and logic-based Artificial Intelligence reasoning-planning where the computation of the Optimal Solution is achieved by Dynamic Programming or Logic Programming. The Creative Composition Problem is embedded within Cecilia: an architecture of a digital companion artificial intelligence agent system composer of dialogue scripts for Well-being and Mental Health. Where Cecilia Framework is instantiated in Well-being and Mental Health domain for optimal well-being development of first year university students. We define the 'The Problem of Creating a Dialogue Composition (PCDC)' and we propose a feasible and optimal solution of it. CCP is instantiated in this applied domain to solve PCDC optimizing the Mental Health and Well-being of the student. CCP as PCDC is applied to optimize maximizing the mental health of the student but also maximizing the smoothness, coherence, enjoyment and engagement each time the dialogue session is composed. Cecilia helps students to manage stress/anxiety to attempt the prevention of depression. Students can interact through the digital companion making questions and answers. While the system "learns" from the user it allows the user to learn from herself. Once the student discovers elements that were unnoticed by her, she will find a better way to improve when discovering her points of improvement.

**Keywords:** Knowledge graph · Knowledge representation · Creative composition · Reasoning planning system · Dialogue composition · Logic programming · Well-being/Mental health optimization · Digital companions

#### **1 Introduction**

The research works of the World Health Organization (WHO) [90] concludes that stress is the world mental health disease of the 21st century and may be the trigger for depression and even suicide if it is not treated correctly. WHO estimates that, in the world, suicide is the second cause of death in the group of 15 to 29 years of age and that more than 800,000 people die due to suicide every year.

Also stress illnesses generate high economic losses since sick people and those who care for them reduce their productivity both at home and at work. According to data from the WHO, 450 million people in the world, suffer from at least one mental disorder.

Well-being (meaning the absence of anxiety, depression and stress) and physical health have been studied by many Scientists. Elizabeth H. Blackburn, Carol W. Greider and Jack W. Szostak were awarded with the Nobel Prize in Physiology - Medicine 2009. They show that Telomerase activity is a predictor of long-term cellular viability, which decreases with chronic psychological distress [35]. E. H. Blackburn et al. proved that mindfulness may exert effects on telomerase activity through variables involved in the stress appraisal process [14]. According to the work of Okoshi Tadashi et al. [59] Technologies of Inclusive Well-being is a field of study that assumes positive technology has the capacity of increasing emotional, psychological, and social well-being and that investigates how information and communication technologies(ICT) empower and enhance the quality of personal experience in these areas. Economists and governments are starting to focus on well-being and "Gross National Happiness" as a new metric for measuring the statuses of the nations.

We have proposed *Cecilia* an architecture of a digital companion artificial intelligence agent system composer of dialogue scripts for Well-being and Mental Health. The core part of our proposal in the design of Cecilia as inclusive technology, is the use of Artificial Intelligence (AI) logical declarative languages used as a reasoningplanning systems that allow to implement the system responsible to define and specify the behaviour of Cecilia with the user. Cecilia should run on a smartphone and students can interact through questions and answers, while Cecilia "learns" from the user it also allows the user to learn from herself. Once the student discovers elements that were unnoticed by her, she can find a better way to improve her own well-being when discovering her points of improvement. Cecilia has been conceived as virtual digital companion assisting the student while She can improve her own skills and She freely wants to get help from the system, once the student acquires full sovereignty of herself by mastering the skills proposed by Cecilia there is no need to continue interacting with Cecilia. Therefore Cecilia is not conceived as a system generating dependency with the student, but on the contrary the aim is to help the student to achieve a mature and healthy interdependence with Herself, Relatives, Friends, Society and Nature by helping the student to acquire full sovereignty of herself by compassionate skills [16].

Cecilia is thought to be an intelligent agent system that supports all individuals with emphasis on university students and young people.

Over the years, science has shown that the brain and the mind work synergistically, that is why the brain can be reorganized, re-educated and regenerated by forming new nerve connections or paths when learning to control the mind through therapies. There are different successful techniques to support a student in overcoming their psychological difficulties such as referred in [61–63]: *Mindfulness* [9,38,78] and *Cognitive Therapy* [4,24,41] where both can be combined [46,83].

*Mindfulness* [9,38,78] It is a way of becoming aware of our reality, giving us the opportunity to work consciously with our stress, pain, illness, loss or in general the problems of our life. Over the past 20 years studies of mindfulness meditation are promising [15,78], and offer insight into specific cognitive processes on how it may serve as an antidote to cognitive stress states and benefit physical and psychological processes. Mindfulness minded to compassion and altruistic behaviour has been considered an important research scientific field of study [16]. For instance it has been founded the Center for Compassion and Altruism Research and Education (CCARE) [85] by Stanford University School of Medicine since 2008.

*Cognitive Behaviour Therapy (CBT)* [4,24,41] was initially developed by J. Beck [4] as a treatment for distorted thinking and brief depression by evaluation of negative thoughts influencing the behaviours. CBT is a psychotherapy that proposes modification of the thought to produce effective health improvement as has been shown in over 2,000 research studies [24]. Including tools such as techniques referred in [61,63] for *finding the Element* [76], reaching *Flow states* [55], *Silence in Therapy* [42] and *Poetry Therapy* [52] provide value added to our proposal particularly for college students. Mindfulness and Flow States are independent different behaviors however they can be alternated [34].

Cecilia also has the capacity to answer questions of university matters and try to create a link with the student because it considers her pleasures and hobbies. Enriching talks ("mild Therapies") proposed to be used by Cecilia are mainly based on mindfulness + cognitive therapy and advice in the professional career preferences, which are focused particularly on preventing and managing mild symptoms of stress, anxiety and depression to reduce the risk of failure in the university life due problems in learning, also to optimize mental health, well-being and behaviour of the students when they face the university challenges as it is justified in the work of Ribeiro Icaro et al. [75]. Thus, during sessions with Cecilia, it is intended that the students understand, accept and "become a friend of" their minds and emotions obtaining a better performance both in school and in their personal life. As described in the work of Luksha Pavel et al. [45] these existential skills include an ability to set and achieve goals (willpower), selfawareness/self-reflection ability (mindfulness), an ability to learn/unlearn/relearn (selfdevelopment) relevant skills (e.g. skill-formation ability), and more. Based on research of Richard Davison referred in [16] well-being is a skill to be learned. Well being has four constituents where each have received serious scientific attention: 1. Resilience, 2. Outlook, 3. Attention and 4. Generosity. Each of these four is rooted in neural circuits, and each of these neural circuits exhibits plasticity. So if any person exercise these circuits, they will be strengthen.

The core type of dialogue for every dialogue session of Cecilia is *Maieutics* described by Scraper Randy et al. in [84]. However each single agent task microdialogue as secondary type of dialogue can be one of the following according the categories stated by Douglas Walton enumerated and specified in [88]: Persuasion. Inquiry, Discovery, Information-Seeking, Casual chat, Negotiation, Deliberation and Eristic.

The Cecilia architecture has been designed to include a Theory of Mind [80] extended with emotions [60,81] of the User Agent (the student) as a Logic Programming (LP) Theory in the User Model. It is by LP Knowledge Representation that is possible to reason and plan a Dialogue Composition (DC) to help the user human development considering her beliefs, intentions, desires and emotions.

The main purpose of Cecilia is to develop Compassionate skills of the user. One property to express true creativity is to guarantee the common good of humanity, in our case we are proposing Cecilia architecture and the solution of the CCP ordering the technology for the benefit of human being, the opposite would not be creativity. In [48–50] are enumerated several results where science has shown how kindness and pro-social behaviors have a biological imperative. the creation of neural stem cells governing short term memory and the expression of genes regulating the stress response are positively affected by motherly affect, positive cognitive state influences positive immune response and vice versa, etc. As Cindy Mason [48–50] has pointed out the repeated interactions with the artifacts we create rub off on us. They are shaping and affecting us continually. Social and emotional relations influence our brain, our genes, our stress reaction and immune system and even wound healing. These findings are significant not just for AI design but to user interfaces, healthcare, education, and design intention in other fields, therefore creating and designing artifacts that support positive emotion such as kindness and compassion are essential to the goal of human-level AI. There is a strong relation between Compassion and Motherly love [48,50]. The psychophysiophilosophy related to motherly love has been a topic of research in scientific field and there are recent discoveries in neurosciences [48,50] that give hints on ways to increase motherly love in each of us, where they can be applied to Haptic Medicine into student daily lives through self-help. Cindy Mason has been a pioneer in defining Intelligence in terms of Compassion applied to the design of Artificial Intelligence artifacts. We have designed Cecilia in this line where AI is founded in a definition of Intelligence based in Compassion.

A *Knowledge Graph (KG)* [22,56] mainly describes real world entities and their interrelations, organized in a graph, defines possible classes and relations of entities in a schema, allows for potentially interrelating arbitrary entities with each other and covers various topical domains. KG are networks of entities, their semantic types, properties, and relationships between entities. KG are networks of all kind entities which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets. Can be associated to Knowledge Representation in Logic such as RDF, Ontologies or Argumentation.

Contribution of this work is to define *the Creative Composition Problem (CCP)* for Human Well-being Optimization by Construction of Knowledge Graph using Knowledge Representation and logic-based Artificial Intelligence reasoning-planning where the computation of the Optimal Solution is achieved by Dynamic Programming or Logic Programming. The Creative Composition Problem is embedded within Cecilia: an architecture of a digital companion artificial intelligence agent system composer of dialogue scripts for Well-being and Mental Health. Where Cecilia Framework is instantiated in the Well-being and Mental Health domain for optimal well-being development of first year university students. CCP is instantiated in this applied domain for the composition of dialogues optimizing the Mental Health and Well-being of the student. We define the *The Problem of Creating a Dialogue Composition (PCDC)* and we propose a feasible and optimal solution of it. CCP as PCDC is applied to optimize maximizing the mental health of the student, but also maximizing the smoothness, coherence, enjoyment and engagement each time a dialogue session is composed. Feasibility of our Cecilia design follows a Proof of Concept strategy [40]. The objectives of Cecilia are presented in [61–63].

Our paper is structured as follows: In Sect. 2 we discuss chat-bots applied for mental health well-being. In Sect. 3 it is presented how the Creative Composition Problem (CPP) is embedded within Cecilia: an architecture of a digital companion artificial intelligence agent system composer of dialogue scripts for Well-being and Mental Health. CCP is instantiated in this applied domain for The Problem of Creating a Dialogue Composition (PCDC) optimizing the Mental Health and Well-being of the student. In Sect. 4 it is presented the definition, model and computation of the 'Creative Composition Problem (CCP)' using Graph Theory and Algorithms. In Sect. 5 it is described the Master-Agent Artificial Intelligent Composer (MAIC) as a Creative Reasoning-Planning Component formed by two modules. The first module of *Diagnosis by reasoning* based in a complex theory in a LP KB that will compose an instance of the CCP (which defines and construct the Graph input of the CCP as PCDC problem). And the second module which *Prescribes* an optimal solution for the CCP as PCDC instance to optimize well-being of the student. In Sect. 6 is presented the evaluation of Cecilia and in Sect. 7 a it is exposed a discussion of Technologies suitable to solve CCP and design of Cecilia Architecture. Finally in Sect. 8 we present our conclusions.

#### **2 Related Work**

#### **2.1 Applied Chat-Bots for Mental Health Well-Being**

Benefits of chat-bots in Health Care Well-being domain are described in [71]. In details it is delineated how chat-bots in health care may have the potential to provide patients with access to immediate medical information, recommend diagnoses at the first sign of illness, or connect patients with suitable health care providers (HCPs) across their community. Theoretically, in some instances, chat-bots may be better suited to help patient needs than a human physician because they have no biological gender, age, or race and elicit no bias toward patient demographics. Chat-bots do not get tired, fatigued, or sick, and they do not need to sleep; they are cost-effective to operate and can run 24 h a day, which is especially useful for patients who may have medical concerns outside of their doctor's operating hours. Chat-bots can also communicate in multiple different languages to better suit the needs of individual patients.

Early research in [71] has demonstrated the benefits of using health care chat-bots in many aspects, with accuracy comparable to that of human physicians. Patients may also feel that chat-bots are safer interaction partners than human physicians and are willing to disclose more medical information and report more symptoms to chat-bots. Psychological Internet interventions have frequently been evaluated and are viewed as a medium independent of time and place. They might be able to help reduce treatment barriers and expand the availability of care. Numerous studies [6] have shown that these interventions, often using cognitive-behavioral techniques, are comparable in their effectiveness to classical face-to-face psychotherapy. Psychological problems such as anxiety and depression are already being effectively addressed in this way.

As referred in [5] the work of Samuel Bell et al. introduces Woebot, a templatebased chat-bot delivering basic CBT, has demonstrated limited but positive clinical outcomes in students suffering from symptoms of depression.

The work of Eileen Bendig et al. referred in [6] presents promising areas for the use of chat-bots in the psychotherapeutic context could be support for the prevention, treatment, and follow-up/relapse prevention of psychological problems and mental disorders. Also they could be used preventively in the future, for example for suicide prevention. According to the work of Samuel Bell et al. [5] in order to provide scalable treatment, several promising studies have demonstrated clinical efficacy of internet-based Cognitive Based Therapy, whereby the need for a face-to-face presence is negated.

In [89] it is reported a survey of technologies for mental Well-being. In the work of Diano Federico et al. referred in [18] it is presented an state of the art in mindfulnessbased mobile applications and the design of a mindfulness mobile application to help emotional self-regulation in people suffering stressful situations. We invite the reader to check the work of Baskar Jayalakshmi et al. referred in [2] where it is reported a comparison of Applied Agents implemented for improving mental health and wellbeing.

In the work of Jingar Monika et al. referred in [37] it is explored how an intelligent digital companion(agent) can support persons with stress-related exhaustion to manage daily activities. Also it is explored how different individuals approach the task of designing their own tangible interfaces for communicating emotions with a digital companion.

In the work of Inkster Becky referred in [33] it is presented an empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being that is using mindfulness as mild therapy in combination with transfer to psychologist whenever the user ask for it. According to Samuel Bell et al. several studies have investigated the clinical efficacy of remote-, internet- and chat-bot-based therapy, but there are other factors, such as enjoyment and smoothness, that are important in a good therapy session.

In the work of Cindy Mason [43] it is exposed an Intelligent Agent Software for Medicine, it describes how software agents that incorporate learning, personalization, proactivity, context-sensitivity and collaboration will lead to a new generation of medical applications that will streamline user interfaces and enable more sophisticated communication and problem-solving.

In the work of Cidy Mason [51] it is presented how Human-Level AI Requires Compassionate Intelligence, much more than just common sense about the world, it will require compassionate intelligence to guide interaction and build applications of the future. The cognition of such an agent includes Meta-cognition: thinking about thinking, thinking about feeling, and thinking about others' thoughts and feelings. Cindy Mason summarize the core meta-architectures and meta-processes of EM-2, a metacognitive agent that uses affective inference and an irrational TMS.

In [28] it is showed an emotions ontology for collaborative modelling and learning of emotional responses.

In [48] it is presented the Multi-Disciplinary Case for Human Sciences in Technology Design, where it is exposed that connecting the dots between discoveries in neuroscience(neuroplasticity), psychoneuroimmunology(the brain-immune loop) and user experience (gadget rub-off) indicate the nature of our time spent with gadgets is a vector in human health - mentally, socially and physically. The positive design of our interactions with devices therefore can have a positive impact on economy, civilization and society. Likewise, the absence of design that encourages positive interaction may encourage undesirable behaviors. The consequences of the architecture of the 21stcentury conversation between man and machine may last generations. AI and the Internet of Things are primary vectors for positive and negative impacts of technology. The work of [48] describes a growing body of co-discoveries occurring across a variety of disciplines that support the argument for human sciences in technology design.

In the work of Cindy Mason [49] it is presented an Engineering Kindness architecture where it is proposed the Building of A Machine With Compassionate Intelligence.

#### **2.2 Applied Knowledge Graph for Mental Health Well-Being**

In [22] it is described definition and works on Knowledge Graph. In [56] it is described the use of Knowledge Graph in Health Well-begin application for Supporting decision making in organ transplanting using argumentation theory. In [91] it is reported a Survey of Knowledge Graph applied in Clinical Decision Support Reasoning Systems. In [79] shows a Knowledge Graph application and construction for Health Domain using Learning Techniques from electronic medical records. Finally in [31] presents different approaches on how to encode graph structure intolow-dimensional embeddings, using techniques based on deep learning and non-linear dimensionality reduction.

In [87] it is described an extension of the Knapsack problem with weighted edges in the graph, it is computed in two phases as a combination of a knapsack problem with a shortest path.

In our proposal CCP as CDP is applied to optimize maximizing the well-being and mental health of the student but also optimizing the smoothness, coherence, enjoyment and engagement each time the dialogue session is composed. As far as we know our Creative Composition Problem as an optimization problem has not been described in the literature. It differs from the work of Voloch [87] since we are maximizing with respect to vertices and weight on the edges. While Voloch is combining Knapsack with Shortest Path, our problem seems a combination between Knapsack and Travelling Sales Problem, we don't compute the optimal solution in two phases but in a single algorithm using dynamic programming.

## **3 Cecilia: An Architecture of a Digital Companion Artificial Intelligence Agent System Composer of Dialogue Scripts for Well-Being and Mental Health**

In this section is presented the architecture of our system Cecilia which is detailed in [70].

This section has the aim to help the reader to be introduced in the context of our general 'Creative Composition Problem (CCP)', where the CCP is instantiated into a specific application domain (mental health and well-being optimization). The CCP will be discussed in the next section since the contribution of our present work is concerning the definition, model and computation of the CCP using Graph Theory, Algorithms and Logic programming solvers. The CCP is instantiated in our Cecilia architecture in order to solve the 'The Problem of Creating a Dialogue Composition (PCDC)'.

A contribution of this present work is our proposal for the definition for *The Problem of Creating a Dialogue Composition (PCDC)* and we propose a feasible and optimal solution of it in the next section.

**Definition 1** *The Problem of Creating a Dialogue Composition (PCDC)***.** *Given a set of resources of AI-tasks, the profit that each AI-task contributes to development mental health and well being of student, the length that each AI-tasks lasts interacting with the student, the profit that a sequence of two distinct related AI-tasks contributes to the coherence, enjoyment and smoothness of a session, the number of AI-tasks interactions expected for a single dialogue session and the time length expected that the dialogue session may last. The problem is To Compose a Dialogue Session as a sequence of AI-Tasks such that optimizes the mental health and well being of the student with an optimal coherent, enjoyable and smoothable session.*

An optimal solution for PCDC instance is the **'Abstract Sequence Dialogue Session (ASDS)'** to be proposed by Cecilia, where for each AI-Task represented as an abstract token name. Each token is associated to Semantic Knowledge, and each token will be mapped to a script specified in a Basic Resources Script Language (BSRL). Each BSRL script is described in a machine language that an imperative language will interpret managing the dialogue interaction as a chat-bot with the User Agent (in our case the Student).

#### **3.1 Cecilia: A Master-Slave AI Agents Digital Companion System Design**

Cecilia defines a master-salve conceptual design following a centralized approach. Namely, we create hundreds of slaves (at least one thousand) such that each of them can perform a very concrete task. All the tasks correspond to interactions with the students. Each interaction are specified as atomic micro-dialogues. An example could be simple or complex task such as to teach the student how to try a meditation exercise. Each task performed by a slave-agent is programmed in the *Basic Script/Resources Language (BSRL)*. Associated to each slave we have its Semantic Knowledge. All the Semantic Knowledge of each slave plus a general theory of interaction among them is written in Logic Programming (LP) Language.

So, the LP theory corresponds to the *Master-Agent Artificial Intelligent Composer (MAIC)* that *reasons/plans* a sequence of few tasks (for a 10–15 min estimated session) that are performed by our slaves that are presented (coordinated) by a distinguished slave (a program interpreter of BSRL in Python) to the student. An analogy that we can make is the following. The LP agent is like a *master composer* of a symphony for a particular audience. The pianist is a particular slave that performs a specific task (playing the piano). The director corresponds to our distinguished *slave* that actually coordinate the rest of *slaves*. After the execution of the symphony, according to the feedback (applause, reviews, etc.), the composer hopefully learns how to create a better symphony.

The main concrete tasks of our intelligent agent described in [61–63,70].

#### **3.2 The Cecilia Logical-Based AI Agent Digital Companion System**

Cecilia is a *Reasoning Planning System* that consist in a cycle of 4 sequential (Fig. 1) processes-modules described below.

**Fig. 1.** Architecture of Cecilia logical-based AI agent digital companion system.

*I. Abstract Script Dialogue Session (ASDS)* is generated by MAIC in this process (Fig. 2 ). ASDS is a composition of slave agents tasks sequence to be performed by Cecilia as a single dialogue session with the student. MAIC basically consists of two modules of KB-reasoning represented and specified via ASP, the lowest one consists of a logical theory that generates -**Diagnoses** a set of recommendations (resources/assets) that would correspond *to construct a graph* a CCP as PCDC instance. The highest module consist of an ASP program that proposes the ASDS plan solving an specific problem based in the constructed graph providing a **Prescription** in dialogue to the student in order to optimize her mental health and well-being. The formal specification of this second stage in terms of an optimization problem *The Problem of Creating a Dialogue Composition (PCDC)* that is an instance of **The Creative Composition Problem**.

**Fig. 2.** I. Abstract script generation

An optimal solution for DCP instance is the intended ASDS to be proposed by Cecilia.

Figure 3 is example of an abstract dialogue session built by MAIC.

*II. Concrete Dialogue Script Generation.* Each AI-Task in the composed dialogue sequence (the CCP optimal solution) is translated into a single BSRL script by concatenation.

Figure 4 is an example of a concrete AI-Task dialogue script 'questioning/answering student w.r.t. Finding Element' for it's abstract token name 'c2' specified as a BRSL program.

**Fig. 3.** Example of an abstract dialogue session built by MAIC.

**Fig. 4.** Example of a AI-Task 'questioning/answering student w.r.t. Finding Element' BRSL program.

*III. Dialogue Interpreter Chat-bot* corresponds to the director of the orchestra that executes the composed dialogue session (Single BSRL program) as interactions of AI-Tasks with the student.

We present a simple example of a conversation in Fig. 5.

*IV. Feedback Module* is an extraction process of relevant information and knowledge. This module filters a user conversation record to obtain the **Student Profile State (SPS)** updating the extensional Knowledge Base.

## **4 The Creative Composition Problem (CCP)**

This section presents the definition, model and computation of *The Creative Composition Problem (CCP)* using graph theory, algorithms and logic programming solvers. It is formalized the CCP Knowledge Graph (KG) used by MAIC within Cecilia to make prescription, after this KG has been constructed by reasoning-diagnostic of MAIC. CCP corresponds to *The Problem of Creating a Dialogue Composition (PCDC)* in our instantiated mental health and well-being domain for Cecilia framework. The prescription, using the constructed Knowledge Graph by diagnostic, builds a composition sequence of AI-task interactions in form of micro-dialogues joined into a single Dialogue Composition Session, a single composed dialogue script, to optimize mental health and wellbeing of the student (user agent), and to optimize at the same time the links between interactions to provide a smooth, enjoyable and coherent dialogue session.

### **4.1 Formal Definition**

#### **CCP Graph Instance**

Let *GL,K* be a complete directed graph defined as tuple *GL,K* = (*V,E,P<sup>V</sup> , PE, W<sup>V</sup>* ), where *V* is a set of vertexes;

*<sup>E</sup>* is a relation between the set of vertexes *<sup>E</sup>* <sup>=</sup> *<sup>V</sup>* <sup>×</sup> *<sup>V</sup>* ;

*<sup>P</sup><sup>V</sup>* is function *<sup>P</sup><sup>V</sup>* : *<sup>V</sup>* <sup>→</sup> <sup>N</sup> that represents the profit that each vertex contributes in the sequence that forms the optimal composition to be created;

*<sup>P</sup><sup>E</sup>* is a function *<sup>P</sup><sup>E</sup>* : *<sup>E</sup>* <sup>→</sup> <sup>N</sup> ∪ {0} that represents the profit that a sequence of two distinct vertexes related in *E* contributes in the sequence that forms the optimal composition to be created;

*<sup>W</sup><sup>V</sup>* is a function *<sup>W</sup><sup>V</sup>* <sup>→</sup> <sup>N</sup> that represents the associated size to each vertex in *<sup>V</sup>* that will be considered to restrict the length of the optimal composition sequence to be created.

*K* is the maximal length in terms of size of vertexes that an optimal composition sequence could sizes.

*L* is the number of vertexes that must compound the optimal composition sequence.

#### **Feasible Solution**

Is a *<sup>L</sup>*-tuple *<sup>X</sup>* = [*x*1*,...,xL*], where {*x*1*,...,x<sup>L</sup>*} ∈ <sup>2</sup>*<sup>V</sup>* , |{*x*1*,...,x<sup>L</sup>*}| <sup>=</sup> *<sup>L</sup>* and -*L <sup>i</sup>*=1 *<sup>W</sup><sup>V</sup>* (*xi*) <sup>≤</sup> *<sup>K</sup>*.

#### **Optimal Solution**

Is a feasible solution *X* = [*x*1*,...,xL*] such that maximizes *Z* = -*L <sup>i</sup>*=1 *<sup>P</sup><sup>V</sup>* (*xi*) + -*<sup>L</sup>*−<sup>1</sup> *<sup>i</sup>*=1 *<sup>P</sup>E*((*xi, x<sup>i</sup>*+1)).

**Remark**: In our instantiated domain problem for mental health and well-being there are always sufficient tasks with weight 1, hence there is always a feasible solution.

The CCP is an 'Optimal Solution' of a given 'CCP Graph Instance'. The 'Optimal Solution' is also named *Optimal Creative Composition Sequence*. A 'Feasible Solution' is also named a *Creative Composition Sequence*.

#### **4.2 Dynamic Programming Definition of CPP**

Given a CCP instance instance *G*(*S, K*) =*< V, E, Pv, Pe >* we compute the optimal solution using a Dynamic Programming strategy. For a subset *S* of vertices *V* , an initial vertex *<sup>s</sup>* and a vertex *<sup>j</sup>* s.t. *<sup>j</sup>* <sup>=</sup> *<sup>s</sup>*, let *<sup>C</sup>*(*S, j, k, l*) be the maximal profit between all feasible solutions of CCP ( composition sequences of vertices in *S*, starting in vertex *s* and ending in vertex *j*, with *l* number of vertices and which cumulative sum of vertices sizes is lower equal than *k*).

When <sup>|</sup>*S*<sup>|</sup> *<sup>&</sup>gt;* <sup>1</sup> we define *<sup>C</sup>*(*S, s, k, l*) = −∞ where <sup>0</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *K, k* <sup>∈</sup> <sup>N</sup> ∪ {0}*,* <sup>0</sup> *<sup>&</sup>lt; <sup>l</sup>* ≤ |*<sup>V</sup>* <sup>|</sup>*, l* <sup>∈</sup> <sup>N</sup>, since the composition sequence can not start and end at *<sup>s</sup>*.

Now, let's express *C*(*S, j, k, l*) in terms of smaller sub-problems. We need to start at *s* and end at *j*; if *i* ∈ *S* − {*j*} is the second last vertex to *j* in the composition sequence, then the overall profit is the profit from *<sup>s</sup>* to *<sup>i</sup>*, namely, *<sup>C</sup>*(*<sup>S</sup>* − {*j*}*, i, k* <sup>−</sup> *<sup>W</sup><sup>V</sup>* (*j*)*, l* <sup>−</sup> 1) plus the profit of the vertex *j*, and the profit of the (*i, j*) edge. We must pick the best *i* such that: *max*{*C*(*S*−{*j*}*, i, k*−*W<sup>V</sup>* (*j*)*, l* <sup>−</sup> 1) + *<sup>P</sup><sup>V</sup>* (*j*) + *<sup>P</sup>E*((*i, j*)) : *<sup>i</sup>* <sup>∈</sup> *S, i* <sup>=</sup> *<sup>j</sup>*} where *<sup>S</sup>* <sup>⊆</sup> *V, j* <sup>∈</sup> *S, j* <sup>=</sup> *s,* <sup>1</sup> *< l* ≤ |*<sup>V</sup>* <sup>|</sup>*, l* <sup>∈</sup> <sup>N</sup>*, W<sup>V</sup>* (*j*) <sup>≤</sup> *k,* <sup>0</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *K, k* <sup>∈</sup> <sup>N</sup> ∪ {0}. *<sup>C</sup>*(*<sup>V</sup>* − {*s*}*, j, K, L*) is optimal solution of CCP from vertex *<sup>s</sup>* to vertex *<sup>j</sup>*, interme-

diate vertices are in *V* − {*j*}.

So the Recursive Definition to compute the CCP optimal solution is:

#### **Base case**

*<sup>C</sup>*({*s*}*, s, k,* 1) = *<sup>P</sup><sup>V</sup>* (*s*) if *<sup>W</sup><sup>V</sup>* (*s*) <sup>≤</sup> *k,* <sup>0</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup> <sup>C</sup>*({*s*}*, s, k,* 1) = −∞ if *<sup>W</sup><sup>V</sup>* (*s*) *> k,* <sup>0</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup>*

#### **Recursive case**

*<sup>C</sup>*(*S, j, k, L*) = *max*{*C*(*S*−{*j*}*, i, k*−*W<sup>V</sup>* (*j*)*, L*−1)+*P<sup>V</sup>* (*j*)+*PE*((*i, j*)) : *<sup>i</sup>* <sup>∈</sup> *S, i* <sup>=</sup> *<sup>j</sup>*} where *<sup>S</sup>* <sup>⊆</sup> *V, j* <sup>∈</sup> *S, j* <sup>=</sup> *s, L >* <sup>1</sup>*, W<sup>V</sup>* (*j*) <sup>≤</sup> *k,* <sup>0</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup>*

In our Cecilia instantiated framework, for mental health and well-being domain, it must be computed *max C*(*<sup>V</sup>* − {*s*}*, j,* <sup>15</sup>*,* 5) for all *<sup>j</sup>* <sup>∈</sup> *<sup>V</sup>* − {*s*}, where our distinguished vertex *s* is a 'greetings' AI-task micro-dialogue, 15 the estimated time that a dialogue session may last, and 5 the number of different interaction tasks for the student. These constants were recommended as fixed numbers according to a specialized psychological therapist, in order to compose a comfortable dialogue session for the student.

#### **4.3 Dynamic Programming Algorithm**

Using dynamic programming, based on the recursive definition to compute the CCP optimal solution, in Algorithm 1 is computed the optimal solution for a given CCP instance. It is used dynamic programming strategy to avoid duplicates in recursive call, using a memory table *<sup>C</sup>*(*S, j, k*), where *<sup>S</sup>* is *<sup>S</sup>* <sup>⊆</sup> *<sup>V</sup>* , *<sup>j</sup>* <sup>∈</sup> *<sup>V</sup>* , and <sup>0</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup>*. In this case, for a given CCP instance, the optimal solution will be the *max C*(*S, j, K*) for all *<sup>S</sup>* <sup>⊆</sup> *V,* <sup>|</sup>*S*<sup>|</sup> <sup>=</sup> *L, j* <sup>∈</sup> *<sup>V</sup>* − {*s*}

**Algorithm 1.** Creative Composition Problem (CCP) by dynamic programming

```
1: function CCP(L, K, V, E, PV , PE, WV , s, C)
2: Opt = −∞
3: for k = 0 to K do
4: if (WV (s) ≤ k) then
5: C({s}, s, k) = PV (s)
6: else
7: C({s}, s, k) = −∞
8: for c = 2 to L do
9: for all S s.t. S ⊆ V , |S| = c, s ∈ S do
10: C(S, s, k) = −∞ s.t. 0 ≤ k ≤ K, k ∈ N ∪ {0}
11: for all j ∈ S, j = s do
12: for k = 0 to K do
13: if (WV (j) ≤ k) then
14: C(S, j, k)=max{C(S−{j}, i, k−WV (j)) + PV (j) + PE((i, j)) : i ∈ S, i = j}
15: Opt = max(Opt, C(S, j, k))
16: else
17: C(S, j, k) = −∞
18: return Opt
```
Observe that lines 8–9, in Algorithm 1, can be easily programmed as a single iteration if the subset of fixed cardinality are already precomputed. In Algorithm 2 it is presented the pseudo code to recover all the feasible solutions that are optimal solution for a given CCP instance. Using traditional backtracking strategy, as usual in dynamic programming techniques, when it is used a memory table.

#### **4.4 Computational Complexity of Dynamic Programming Algorithm to Compute the Optimal CCP Solution**

Given a CCP instance, we would like to know the estimated computational complexity time to compute an optimal solution. When the computation definition of a problem is NP-Hard class, then complexity computation time could be intractable in terms of real run-time machine computation [20].

Sometimes, a NP-Hard problem can be parametrized in order to achieve polynomial time computation, so is the case when in an algorithm definition with a greater than factorial-exponential order complexity, commonly present in combinatorial NP-Hard


**Algorithm 2.** Recovers all the Optimal Composition Sequences

problems, it can be computable in polynomial time, when one of the argument of the given input instance of a problem definition is fixed as a constant number [20].

It can be easily seen that the definition to compute a CCP optimal solution, for a given CCP instance, is a combination between the well know combinatorial problems *The Travelling Sales Problem* and *The Knapsack Problem* see [13]. This since the CCP optimal composition sequence requires to compute a 'Hamiltonian path' of a fixed length, where the cost between edges is maximized, but also we would like to select those vertices subject to a capacity knapsack constraint (As in the knapsack problem definition), where also the profits of vertexes is maximized. Since the computation of an optimal solution for a given CCP instance is a combinatorial problem, then this give us an exponential time to compute the solution.

Note that between Algorithm 1 and Algorithm 2 a more complex number of computation is required to solve Algorithm 1 instead of Algorithm 2.

So let's focus in Algorithm 1 to estimate computation time complexity.

The iterative statements on lines 8–12 are greater in computation time than the iteration on line 3. The computation time on lines 8–12 can be expressed as the number of permutation *<sup>P</sup>*(|*<sup>V</sup>* <sup>|</sup>*, L*) in the 'for' statement on lines 8–9, the computation time in the 'for' statement on line 11 can be expressed as <sup>|</sup>*<sup>V</sup>* | − <sup>1</sup>, and computation time of the 'for' statement on line 13 can be expressed as *K*. Therefore the estimated computational complexity time to compute an optimal CCP solution is *<sup>O</sup>*(*V, L, K*) = *<sup>P</sup>*(|*<sup>V</sup>* <sup>|</sup>*, L*) · |*<sup>V</sup>* | · *<sup>K</sup>*.

However, since we have fixed limit constants as boundaries for the arguments *L* and *K*, then we have a polynomial time computation.

Specifically after receiving guidance from a psychologist and other mindfulness experts, many short dialogue sessions are suggested, not a long one, and for this it can be seen that setting the parameter *L* = 5 (five task per dialogue session ) and *K* = 15 (15 min that the a whole dialogue session may last) seems to be a recommended measure. This does not exclude recommending to the student some relatively long exercise (20–30 minutes) that he can do on his own.

Since the suggested *L* is fixed to a value of 5, a naive strategy would require *<sup>P</sup>*(|*<sup>V</sup>* <sup>|</sup>*,* 5) permutations, that would mean a <sup>5</sup> grade polynomial, which is still expensive for a large |*L*|.

For our mental health and well-being instantiated domain in Cecilia, MAIC constructs a Knowledge Graph with <sup>|</sup>*<sup>V</sup>* | ≤ <sup>20</sup>. This is possible due the logical theories in *Diagnostic Module*, and also due the structure of the nature of knowledge present in our enriching talks (mild-therapies) domain, when they are formally represented using mathematical logic by LP. Each one of the mild therapies theories presents a partial order structure as a relation between stages to progress in the acquisition of skills, for instance Mindfulness requires an ordered sequence of stages.

Then for the worst case we would have *P*(20*,* 5) = 15504, and for a worst case where *<sup>L</sup>* = 5 and *<sup>K</sup>* = 15 we have *<sup>O</sup>*(*V,* <sup>5</sup>*,* 15) = 15504 · <sup>5</sup> · <sup>15</sup>, that is around 1*,* 000*,* 000, which is still tractable in computation time.

A trade-off w.r.t <sup>|</sup>*<sup>V</sup>* <sup>|</sup> could be in average cases a fixed value of <sup>15</sup>, that can also be considered feasible in computational terms (run time). Moreover diagnostic and prescription are not computed in real time of the session, but between sessions.

It is always possible to relax the problem and use, for example, greedy techniques to obtain feasible solutions close to the optimal for a much larger instance. For example, it can be used a similar strategy such as the one used in a rational Knapsack problem computation, where the ratios between profits of objects and the cost of objects are sorted, in ascendant way, to propose a feasible solution for large inputs, getting close to the optimal solution with an approximate complexity lower than *<sup>O</sup>*(*V,K*) = <sup>|</sup>*<sup>V</sup>* <sup>|</sup>*log*(|*<sup>V</sup>* <sup>|</sup>), lower than <sup>400</sup> for <sup>|</sup>*<sup>V</sup>* <sup>|</sup> = 20, and it could be considered to prescribe a KG with more than 100*,* 000 vertexes (AI-Tasks).

#### **4.5 Running Example**

In Appendix A there is an example of a CCP Graph Instance. In Appendix B it is shown how dcp is computed using the presented dynamic programming Algorithm 1 and Algorithm 2. Note that the computation is made as a table where sets are increasing by cardinality, then the recursive function *C* to obtain a DCP optimal solution is computed in terms of the memory table of simpler cases calculated before.

## **5 Creative Reasoning-Planning: The Master-Agent Artificial Intelligent Composer (MAIC) of Dialogue Scripts for Well-Being and Mental Health**

Conceptually the MAIC in Cecilia reasons using Answer Set Programming (ASP) [27,82] and consists of two modules described in Sect. 3. The first module *Diagnoses - Reasons* based in a complex theory in a LP KB that will compose an instance of the CCP KG, the diagnostic defines and construct the Knowledge Graph input of the CCP as PCDC problem, presented in the previous Sect. 4. Further details are discussed in this section. The second module *Prescribes* an optimal solution for CCP as PCDC KG instance to optimize mental health and well-being of the student but also the dialogue session interactions.

#### **5.1 The MAIC Diagnostic: Enriching Talks (Mild Therapies) Theories Specified in ASP**

It has been defined for this project 7 logic programming theories under Answer Sets Programming semantics to model the student profile, and to create a dialogue composition proposal for each session with the student.


#### **5.2 The MAIC Prescription and Recommendation: Solving the Creative Composition Problem (CCP)**

The CCP as PCDC optimization problem as presented in Sect. 4 can be solved by Algorithm. But also It can naturally be encoded in logic programming, for instance it can be easely encoded in CIAO [32], DLV [29] as well as CLASP [26] solvers.

Since CCP 4 is actually a logically stratified logic program and hence we can informally say that is logically very simple program. The three codes (CIAO, DLV, CLASP) are almost the same with minor changes in coding details. The CCP can be encoded with recursive approach as presented in Sect. 4 using APOL. APOL [64] is a partial order programming [67,69] very similar to mathematical programming, where a function is minimized ( or maximized ) and has a set of restrictions, the difference is that the domain of values is a partial order, where partial order clauses can be expressed as normal clauses. APOL is an extension of ASP that allows to express optimization problems in a very suitable way, integrating disjunctive clauses and partial-order clauses. It performs a dynamic programming algorithm and interacts with DLV [23]. On the other hand there is also an implementation of partial order programming following a standard top-down approach [36].

**Defining Profits.** Recall that we have profits in the definition of CCP as PCDC instance. One kind of them are associated to each AI-task asset, recall that each AI-task asset is a micro dialogue. The other kind of them are profits associated to every pair of micro-dialogues with the intended meaning of measuring the coherence, enjoyment and smoothness of a session.

The first type of profits assignment to micro-dialogues is defined by means of a logical theory in ASP that would take into account previous answers of the user. For example, suppose the student has anxiety and that for a suggested mindfulness exercise *A* the user has said to Cecilia that it has been of benefit for him. Then the MAIC by ASP theory would assign a value *v*1. However, let us also assume that he has previously performed a mindfulness exercise *B*, and the student has been sceptic regarding the usefulness of that exercise. Then MAIC by ASP theory assign a value of *v*2 less than *v*1 to micro dialogue *B*. For instance *v*2 could be 1, and *v*1 could be 8, these values are adjusted through more interactions between Cecilia and the student, but also with a semi-automatic process using Machine learning specially Inductive Logic Programming (this point is still outside the scope of this paper, and for the moment we have fixed rules stated with the endorsement of an expert psychologist). The second type of profits (not yet considered in this work) we assume that it would be a learning process possibly using Machine Learning specially Inductive Logic Programming. It will consist in a combination of a priori rules stated by psychologist combined with Machine Learning rules and the answer of the student. The rules would be derived from a pilot starting group of students interacting with Cecilia, that generalizes in a universal way the concluded rules for profit assignment of the micro-dialogues.

Transition from reasoning about theories representing domain knowledge that generates by reasoning the Knowledge Graph CCP (DCP) instance is made by the following rules structure described in LP under Answer Set Programming Semantics:

To assign profit to an AI-task:

*vertices*(*v*(*xi*)*, P*(*xi*)) : <sup>−</sup> *condition*<sup>+</sup> *<sup>m</sup>, not condition*<sup>−</sup> *n .*

To assign profit between two AI-tasks:

*edges*(*e*(*xi, x<sup>j</sup>* )*, PE*(*xi, x<sup>j</sup>* )) : <sup>−</sup> *condition*<sup>+</sup> *<sup>m</sup>, not condition*<sup>−</sup> *<sup>n</sup> .* where *condition*<sup>+</sup> *m* and *not condition*<sup>−</sup> *<sup>n</sup> .* are predicates inferred and described from ASP Knowledge Base representing the instantiated domain knowledge.

### **6 Pre-evaluation of Cecilia**

Cecilia was pre-evaluated by bachelor students. The pre-evaluation asked to the students about they appreciation of Cecilia conversations.

What It was made to test the software consisted in the follwing steps:


The students didn't chat, this is an indirect pre-evaluation.

The pre-evaluation considered a test with the following target aspects for obtain retrieval from the students: 1) creativity, 2) easy to read/learn, 3) interesting, 4) supportive, 5) good, 6) easy, 7) motivating, 8) clear and 9) friendly. Using a discrete scale between 1 and 7, where 1 means the worst behaviour, and 7 means the best behavior. The results were in average a value of 6 for each considered aspect. Seven examples, one for each student, were pre-evaluated.

The Table 1 exposes the pre-evaluation results obtained from the students.


**Table 1.** Pre-evaluation retrieval obtained from the students

Three comments of the students about the conversations are the following:

*"I really like what I read, basically because I learn a lot of thing besides the logical exercises, I like history, and I like a little of literature with the analogy of the ying-yang and the poem, subjects that I am really in love, it's too interesting to appreciate these subjects to be combined. It makes the learning process to be much funny, that's motivated me to change my attitude talking about maths, I know maths, I just need to practice, It's like anything else, you have to practice to be a master, there's not other way. We are the only ones who are responsibles of develop our knowledge, we already have it! :)"*

*"It was an interesting conversation, and it helped me to better understand logical connectives. The conversation was very friendly and I liked how the concepts are simplified. Also, it was very easy to read."*

*"Very interesting I love it!"*

Figure 6 shows an example of the Cecilia GUI application. The used language in the application is Spanish, however it will be translated to an English language version. Cecilia is designed to be independent of the knowledge scripts domain, for example, the use of Enriching Talks. Also Cecilia is independent of the used human language to dialogue with the Agent User (in our case the student).


**Fig. 6.** Example of Cecilia's GUI

## **7 Technologies Suitable to Solve CCP and to Implement the Design of Cecilia Architecture**

Another major issue of this paper was to justify the use of ASP besides the one present in the last subsection.

We also propose ASP for the following list of reasons.

– Flexibility to represent all major issues of the Belief Model of the student in different forms. For instance in a previous work [63] we use a standard Generate/Test technique to represent our problem. Here we use an optimization problem. Both forms were easily encoded in ASP. Default rules were very helpful in both cases. In this second approach the Well-founded semantics was sufficient to express our problem. However adding integrity constraints were useful to ensure correctness of our approach. When the system became inconsistent, due mainly because it finished all the resources that it has, we have a fixed default plan to propose.


Following Gupta's advice, complex applications, as proposed in this work, will become possible if all these extensions where combined into a single system [30].

## **8 Conclusions**

Contribution of this work<sup>1</sup> is to Define the Creative Composition Problem (CCP) for Human Well-being Optimization by Construction of Knowledge Graph using Knowledge Representation and logic-based Artificial Intelligence reasoning-planning where the computation of the Optimal Solution is achieved by Dynamic Programming or

<sup>1</sup> We thank the support of Psychologist Andres Munguia Barcenas.

Logic Programming. The Creative Composition Problem is embedded within Cecilia2: an architecture of a digital companion artificial intelligence agent system composer of dialogue scripts for Well-being and Mental Health. Where Cecilia Framework is instantiated in Well-being and Mental Health domain for optimal Well-being development of first year university students. We define the 'The Problem of Creating a Dialogue Composition (PCDC)' and we propose a feasible and optimal solution of it. CCP is instantiated in this applied domain to solve PCDC optimizing the Mental Health and Well-being of the student. CCP as PCDC is applied to optimize maximizing the mental health of the student but also maximizing the smoothness, coherence, enjoyment and engagement each time the dialogue session is composed. For Future Work Optimization of Mental Health and Well-being can be enhanced by sentiment analysis. It is possible to use set covering to classify patterns where tests of properties can separate between emotions and the number of tests to be minimized by mapping them to a set covering problem. For this it is possible to use of set covering [77] and minimal cut [72] algorithms. Note that this kind of combinatorial problems are easily encoded in ASP. Also MAIC can be enhanced with Logic Programming integrating Preferences and Optimization [7,8,53]. Following Gupta's advice complex applications as proposed in this work will become possible if all these extensions where combined into a single system [30]. In a recent paper, we investigated how to generate class notes for the development of psycho-affective learning based on a similar methodology as the one presented in this paper, namely the "Creative Composition Problem", see [10]. For future work we consider to explore the idea of representing Knowledge using alternative non-monotonic paradigms (besides from ASP) such as those found in [11,57,58,66,68,69]. As Cindy Mason stated in [49], the mechanisms for reasoning with regards to another's feelings only makes sense if there is wisdom to go along with it. This is a very important point. For a machine to engage in our world with a compassionate stance, we are faced with the task of articulating the common sense of compassion. Not all engineers and scientists are born with the gift for empathy, sympathy or compassion. We require collaboration with educators, psychologists, mothers, priests, our pets and even the kindness of strangers, to achieve the level of interaction that would enable the compassionate stance in a computational machine. The idea of programming our interfaces and embodied agents with a compassionate stance has great potential for positive influence in our cultures. This is why in our future work we will be integrating assessment of other disciplines to improve the development of compassion in our research work [48,50].

<sup>2</sup> The Cecilia application is available in https://github.com/luis-angel-montiel-moreno/efriend with the name of E-friend.

#### **A Appendix 1**

*% The f o l l o w i n g i s an ex am ple o f a CCP Graph I nstance .% The following i s an ex am ple o f a CCP Graph I nstance . % The input format consist of the numeric c onstants : number of ve rtixes , L , K. % Following by two vectors W and P V and one matrix P E .*

```
num vertixes = 9 .
L=4 .
K = 15 .
# 123456789
W: 1 1 1 1 1 6 11 15 7
P V : 5 2 7 12 7 1 12 12 6
P E :
# 1 23456789
1 16 13 19 15 3 17 19 6 9
2 0 1 1 13 15 19 12 2 17
3 5 14 7 6 0 9 0 0 16
4 3 5 5 8 13 18 19 8 14
5 8 19 0 17 19 13 18 5 8
6 9 9 3 6 6 9 13 12 9
7 15 4 1 11 7 6 17 7 0
8 7 7 0 1 7 0 13 5 11
9 6 3 8 7 13 18 10 11 4
```
#### **B Appendix 2**

```
dcp function is denoted as c
s={1 , 9}
c(s , 8 , 8) = 20 , c(s , 8 , 9) = 20 , c(s , 8 , 10) = 20 , c(s , 8 , 11) = 20 , c(s , 8 , 12) = 20 , c(s , 8 , 13) = 20 , c(s , 8 , 14) = 20 ,
         c(s , 8 , 15) = 20 ,
s={1 , 7}
c(s , 6 , 12) = 36 , c(s , 6 , 13) = 36 , c(s , 6 , 14) = 36 , c(s , 6 , 15) = 36 ,
...
s={1 , 6 , 9}
c(s , 5 , 14) = 39 , c(s , 5 , 15) = 39 , c(s , 8 , 14) = 38 , c(s , 8 , 15) = 38 ,
s={1 , 5 , 9}
c(s , 4 , 9) = 40 , c(s , 4 , 10) = 40 , c(s , 4 , 11) = 40 , c(s , 4 , 12) = 40 , c(s , 4 , 13) = 40 , c(s , 4 , 14) = 40 , c(s , 4 , 15) = 40 ,
         c(s , 8 , 9) = 29 , c(s , 8 , 10) = 29 , c(s , 8 , 11) = 29 , c(s , 8 , 12) = 29 , c(s , 8 , 13) = 29 , c(s , 8 , 14) = 29 , c(s , 8 , 15) =
         29 ,
...
s={1 , 5 , 6 , 9}
c(s , 4 , 15) = 58 , c(s , 5 , 15) = 54 , c(s , 8 , 15) = 50 ,
s={1 , 4 , 6 , 9}
c(s , 3 , 15) = 57 , c(s , 5 , 15) = 71 , c(s , 8 , 15) = 66 ,
...
s={1 , 2 , 3 , 4}
c(s , 1 , 4) = 60 , c(s , 1 , 5) = 60 , c(s , 1 , 6) = 60 , c(s , 1 , 7) = 60 , c(s , 1 , 8) = 60 , c(s , 1 , 9) = 60 , c(s , 1 , 10) = 60 ,
         c(s , 1 , 11) = 60 , c(s , 1 , 12) = 60 , c(s , 1 , 13) = 60 , c(s , 1 , 14) = 60 , c(s , 1 , 15) = 60 , c(s , 2 , 4) = 57 , c(s , 2 , 5) =
         57 , c(s , 2 , 6) = 57 , c(s , 2 , 7) = 57 , c(s , 2 , 8) = 57 , c(s , 2 , 9) = 57 , c(s , 2 , 10) = 57 , c(s , 2 , 11) = 57 , c(s , 2 , 12) =
         57 , c(s , 2 , 13) = 57 , c(s , 2 , 14) = 57 , c(s , 2 , 15) = 57 , c(s , 3 , 4) = 72 , c(s , 3 , 5) = 72 , c(s , 3 , 6) = 72 , c(s , 3 , 7) =
         72 , c(s , 3 , 8) = 72 , c(s , 3 , 9) = 72 , c(s , 3 , 10) = 72 , c(s , 3 , 11) = 72 , c(s , 3 , 12) = 72 , c(s , 3 , 13) = 72 , c(s , 3 , 14)
         = 72 , c(s , 3 , 15) = 72 ,
∗∗∗∗
optimal solution of dcp
82
```
#### **References**

optimal dcp sequence (1 **,** 4 **,** 5 **,** 7)

1. Arias, J., Carro, M., Chen, Z., Gupta, G.: Constraint answer set programming without grounding and its applications. In: Datalog 2.0 2019–3rd International Workshop on the Resurgence of Datalog in Academia and Industry co-located with the 15th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2019) at the Philadelphia Logic Week 2019, Philadelphia, PA (USA), 4–5 June 2019, pp. 22–26 (2019). http://ceur-ws.org/Vol-2368/paper2.pdf


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Set Visualisations with Euler and Hasse Diagrams**

Uta Priss(B)

Zentrum fur erfolgreiches Lehren und Lernen, Ostfalia University, Wolfenb ¨ uttel, Germany ¨ http://www.upriss.org.uk

**Abstract.** This paper discusses set visualisations with concept lattices in the sense of Formal Concept Analysis (FCA) in contrast to visualisations with Euler diagrams. Both types of visualisations have advantages and disadvantages. Because of the connection between both fields and the body of knowledge that exists in both fields it is of interest to investigate whether results from either field can contribute to the other.

## **1 Introduction**

Sets and their intersections can be visualised with Venn and Euler diagrams but also using mathematical lattice theory and a certain type of diagram (*Hasse diagram*) that is commonly used with lattices. It is therefore of interest to compare Euler and Hasse diagrams both with respect to what can be observed from the diagrams but also with respect to underlying theoretical constructs. While a translation between lattices and Venn diagrams is straightforward, the connection between *well-formed* Euler diagrams and lattices is not trivial. Lattice theory has produced a large body of knowledge which could potentially be beneficial for research about well-formed Euler diagrams. The research about Venn and Euler diagrams provides, for example, applications and algorithms which could be of interest for Hasse diagrams as well.

The version of lattice theory used in this paper is called Formal Concept Analysis (FCA) and has been developed since the 1980s as an applied mathematical theory of knowledge representation (Ganter and Wille 1999). Venn and Euler diagrams are wellestablished as a visualisation of sets that is used, for example, in schools when students are first introduced to set theory. Hasse diagrams may be less intuitive at first sight and require some training. Priss (2017) discusses misconceptions that students initially have about Hasse diagrams of concept lattices in general. If restricted to specific tasks, Eklund et al. (2004) show, however, that novice users can be instructed to use Hasse diagrams fairly effectively.

As far as we know, the relationship between FCA and Euler diagrams has so far not been investigated in any great depth1. The intention of this paper is to elaborate the basic connections between both fields. This paper provides an introduction to both fields and basic translations between Venn/Euler and Hasse diagrams. It discusses the application of some lattice-theoretical properties to Euler diagrams. We suspect that

c The Author(s) 2021

<sup>1</sup> As evidenced by a query on Google Scholar for "Formal concept analysis" and "Euler diagrams" which retrieves very little.

M. Cochez et al. (Eds.): GKR 2020, LNAI 12640, pp. 72–83, 2021.

https://doi.org/10.1007/978-3-030-72308-8\_5

many researchers from either field are not aware of all of the connections. Because each field has a slightly different focus, it is conceivable that a combination might provide further interesting results. Many questions about the relationship between well-formed Euler diagrams and lattices still remain open.

Sections 2 and 3 of this paper provide introductions to Venn, Euler and Hasse diagrams and FCA. Section 4 covers Venn diagrams and their (well-known) relationship to Boolean lattices. Sections 5, 6, 7 discuss different aspects of the relationship between Euler and Hasse diagrams. Although most of the individual mathematical aspects presented in this paper are not new, we believe that the compilation and elaboration of details with respect to the examples presented in this paper is new. A possibly provocative conclusion of this paper is that although many people may find Euler diagrams "intuitive" as a representation of sets, from a structural viewpoint Hasse diagrams are potentially more suitable for visualising set theory than Venn and Euler diagrams.

#### **2 A Brief Introduction to Euler and Venn Diagrams**

Venn and Euler diagrams are a means for graphically representing sets and their intersections and unions. A more detailed introduction and further background is, for example, provided by Rodgers (2014). Venn diagrams contain all possible intersections for a powerset (i.e. set of all subsets of a set). For example, D1 and D2 in Fig. 1 show Venn diagrams for 3 and 4 sets. Venn diagrams for more than 3 sets cannot be represented by only using circles. Euler diagrams are similar to Venn diagrams but exclude zones which are known to be empty.

**Fig. 1.** Venn diagrams (D1 and D2) and non-well-formed Euler Diagrams (D3a, D3b and D4)

The following terminology applies to Venn and Euler diagrams in this paper: Venn and Euler diagrams consist of closed *curves* which have *labels*. *Minimal regions* are the smallest areas in a diagram which are surrounded by lines and not divided further. *Regions* are sets of minimal regions. *Zones* are maximal regions that are within a set of curves and outwith the remaining curves. For a set *L* of curve labels, the notation *E*(*L*) is used in this paper for a *set of zones*. In other words, *E*(*L*) is a subset of the powerset of *L* that corresponds to the zones of an Euler diagram.

The reason for distinguishing minimal regions and zones is that zones are the smallest set-theoretically meaningful areas in a diagram whereas minimal regions are the smallest visible areas in a diagram. In a *well-formed* Euler diagram, zones correspond to minimal regions. Further conditions for being well-formed are defined slightly differently by different authors (e.g. Flower et al. (2008)). In order to be well-formed, a diagram should not contain a zone that is *disconnected* and split into several minimal regions (as in D3b in Fig. 1 where the black region in the middle belongs to the outer region). Diagrams should not contain *n-points* for *n >* 2 that is points where more than 2 curves cross (as in D3a). Different curves should not be *concurrent* (as in D4). Each curve should have at most one label. Curves should not intersect themselves. There should not be any brushing points where several curves meet without crossing.

In universal algebra or algebraic logic, relationships are established between equational classes and algebraic structures. For example, the powerset of a set with operations ∩, ∪ and complementation corresponds to a Boolean algebra or Boolean lattice which can be defined as an equational class. Any subset of a powerset that is closed under ∩ and ∪ corresponds to a distributive lattice which can also be defined as an equational class. While a single (fairly simple) equation is needed to determine whether a lattice is distributive, no similar simple equation or property has yet been found that determines whether a set *E*(*L*) can be represented as a well-formed Euler diagram. Although it seems visually clear what Euler diagrams are and what they look like, from an algebraic viewpoint well-formed Euler diagrams are neither simple nor intuitive. So far algorithms have been provided for deciding whether an Euler diagram is wellformed (for example, Flower et al. (2008)) but not an equational characterisation.

#### **3 Formal Concept Analysis and Hasse Diagrams**

A brief introduction to FCA is included here. More details can be found in the main FCA textbook by Ganter and Wille (1999). FCA is a theory of knowledge representation that was invented by Rudolf Wille in the 1980s. It provides a mathematical model for conceptual hierarchies using lattice theory. A *formal context* is a triple (*O, A, I*) consisting of a set *O* of *formal objects*, a set *A* of *formal attributes* and a binary relation *I* between them. This paper is only concerned with finite sets. The relation *oIa* is read as "object *o* has attribute *a*". The qualifier "formal" is used because being an object or attribute is a role. The qualifier can be omitted if it is clear what is meant. Formal objects and attributes need not be "real world" objects and attributes in any sense. The left-hand side of Fig. 2 shows an example of a formal context with types of animals as formal objects and "female", "juvenile" and "male" as formal attributes. The righthand side shows a concept lattice (as defined below) using a visualisation for partially ordered sets called *Hasse diagram*.

Concepts are formed by starting with a set of objects, then collecting all attributes which they have in common and then adding any further objects that also have these attributes. Dually, one can also start with attributes. Formally, all common attributes of a set *O*<sup>1</sup> ⊆ *O* of objects are denoted by *O* <sup>1</sup> := {*<sup>a</sup>* <sup>∈</sup> *<sup>A</sup>* <sup>|</sup> *oIa* for all *<sup>o</sup>* <sup>∈</sup> *<sup>O</sup>*1}. All common objects of a set *A*<sup>1</sup> ⊆ *A* of attributes are denoted by *A* <sup>1</sup> := {*<sup>o</sup>* <sup>∈</sup> *<sup>O</sup>* <sup>|</sup> *oIa* for all *<sup>a</sup>* <sup>∈</sup> *<sup>A</sup>*1}. A *formal concept* is a pair (*O*1*, A*1) where *<sup>O</sup>*<sup>1</sup> <sup>=</sup> *<sup>A</sup>* <sup>1</sup> and *A*<sup>1</sup> =

**Fig. 2.** A formal context and concept lattice

*O* <sup>1</sup>. The right-hand side of Fig. 2 shows 4 formal concepts. The set *O*<sup>1</sup> of a formal concept (*O*1*, A*1) is called the concept's *extension*; the set *A*<sup>1</sup> is called the concept's *intension*. For example, ({filly}, {juvenile, female}) is a formal concept with extension {filly} and intension {juvenile, female}. The pair ({calf, lamb}, {juvenile}) is not a formal concept because it fulfils *O* <sup>1</sup> = *A*<sup>1</sup> but not *A* <sup>1</sup> = *O*1. It follows from the definition of the -operation that for any set *S* of objects or attributes *S* = *S* and *S* ⊆ *S*.

**Fig. 3.** A concept lattice with minimal labelling

A concept (*O*1*, A*1) is a subconcept of a concept (*O*2*, A*2) if *<sup>O</sup>*<sup>1</sup> <sup>⊆</sup> *<sup>O</sup>*2. This is equivalent to *A*<sup>1</sup> ⊇ *A*<sup>2</sup> (as can be observed in Fig. 2). The set of formal concepts together with a subconcept ordering forms a mathematical lattice. In a Hasse diagram of a concept lattice, nodes denote concepts and edges connect adjacent concepts according to the subconcept ordering. In Fig. 2 the full concepts are written within the nodes. Figure 3 shows a different concept lattice, this time with *minimal labelling* because each object is written slightly below the lowest concept it belongs to and each attribute is written slightly above the highest concept it belongs to. Such objects/attributes are called *immediate* objects/attributes of a concept in this paper. In the remainder of this paper only minimal labelling is employed. An extension can then be read by collecting all objects on every downwards path from a concept and an intension by collecting all attributes on every upwards path from a concept. The top concept of a lattice has all objects in its extension. It can but does not need to have an attribute in its intension and represents some sort of global or universal concept. The bottom concept has all attributes in its intension and corresponds to some sort of *Null* concept. It can but does not need to have an object in its extension.

In a finite lattice, each set of concepts has an infimum (called *meet* and denoted by ∧) and a supremum (called *join* and denoted by ∨). A meet is the largest shared concept below a set of concepts. Dually, a join is the smallest shared concept above a set of concepts. A concept in a lattice that has exactly one adjacent upper concept (i.e., one edge going up from the node) is called ∧-irreducible and must have at least one immediate attribute. This is the case for all nodes that have immediate attributes in Figs. 2 and 3 except for the top concept in Fig. 2 (with attribute "juvenile") because a top concept is the meet of an empty set and thus ∧-reducible. Dually, a concept with exactly one adjacent lower concept is called ∨-irreducible and must have at least one immediate object. In Fig. 3 the concept with immediate objects {foal, calf, lamb} is ∨-reducible. If the objects {foal, calf, lamb} were removed from the formal context the resulting lattice would still be isomorphic to the one in Fig. 3. But if "filly" or "colt" were removed from the formal context, then the lattice structure would change.

For concept lattices, logical implications amongst attributes can be read from the Hasse diagram because the attributes of a subconcept imply the attributes of a superconcept. For example, in Fig. <sup>2</sup> "male <sup>=</sup><sup>⇒</sup> juvenile" and in Fig. <sup>3</sup> "female <sup>∧</sup> male <sup>=</sup><sup>⇒</sup> juvenile ∨ adult". It should be cautioned that all statements about concepts and implications are only valid for the formal context to which they belong. For example, "male <sup>=</sup><sup>⇒</sup> juvenile" is true for Fig. <sup>2</sup> but not for Fig. 3.

#### **4 Venn Diagrams and Boolean Lattices**

Sets naturally have an extensional description by listing elements and an intensional description using logical expressions, for example consisting of labels of other sets together with set-theoretical operations. Thus, one can build formal contexts (*U, L,*∈) where the formal objects are elements of a (universal) set *U*, the formal attributes are labels (in *L*) corresponding to subsets of *U* and the incidence relation is the element-of relation (∈). The Hasse diagrams below are to be interpreted in that manner. For Venn (or Euler) diagrams only set labels are required, set elements are optional but can be written into zones. In some of the Venn (and Euler) diagrams below, set elements are included in order to emphasise the correspondence between Venn and Hasse diagrams.

In a concept lattice of a context (*U, L,*∈), the lattice-theoretical <sup>∧</sup>-operation correlates with a ∩-operation amongst subsets of *U*. For example in Lattice 1 in Fig. 4, ({*a, b*}*,* {*X*}) <sup>∧</sup> ({*a, c*}*,* {*<sup>Y</sup>* })=({*a*}*,* {*X, Y* }) corresponds to {*a, b*}∩{*a, c*} <sup>=</sup> *X* ∩*Y* . In such lattices, the lattice-theoretical ∨-operation correlates with a ∩-operation amongst subsets of *<sup>L</sup>*. For example, in Lattice 2, {*Y,Z*}∩{*Y,W*} <sup>=</sup> {*<sup>Y</sup>* }. In either case, only containment holds for <sup>∪</sup>-operations. For example, ({*F, B, a*}*,* {*Y,Z*}) <sup>∨</sup> ({*G, B, a*}*,* {*Y,W*})=({*F, G, B, a, l*}*,* {*<sup>Y</sup>* }) but (*<sup>Y</sup>* <sup>∩</sup> *<sup>Z</sup>*) <sup>∪</sup> (*<sup>Y</sup>* <sup>∩</sup> *<sup>W</sup>*) <sup>⊂</sup> *<sup>Y</sup>* .

Lattices corresponding to Venn diagrams (Fig. 4) are Boolean lattices and contain 2*<sup>n</sup>* concepts (for *n* labels) each of which relates to a zone in a Venn diagram. Their Hasse diagrams form hypercubes. The dotted lines in Lattice 2 correspond to zones in Diagram 2 that are not neighbours in the Venn diagram even though they could

**Fig. 4.** Venn and Hasse diagrams of Boolean lattices

(or should) be neighbours. For example: the zone with the immediate object *H* is in *X* ∩ *Y* . But while it is a neighbour of the zone with the immediate object *o* (in *X*) it is not a neighbour of the zone with the immediate object *l* (in *Y* ) even though structurally the relationship between *X* and *X* ∩*Y* is isomorphic to the relationship between *Y* and *X* ∩*Y* . Thus Lattice 2 shows relationships which are not as easily visible in Diagram 2.

Flower et al. (2008) define a *dual graph* of an Euler diagram as a labelled graph which has a vertex for each zone and an edge if the zones are neighbours. Each edge is labelled by the set labels which distinguish their vertices. For example, an edge between *X* and *X* ∩*Y* is labelled by *Y* . Flower et al. show that for well-formed Euler diagrams, each edge has exactly one label. This condition is called *single-label condition* in the remainder of this paper. A *superdual graph* contains all possible edges with exactly one label. Thus the dual graph of Diagram 2 corresponds to the solid lines in the diagram for Lattice 2 (as an undirected graph) whereas the superdual graph corresponds to the solid together with the dotted lines. A superdual graph represents an abstract set of zones of an Euler diagram that is independent of how the diagram is exactly drawn. The next section shows that not every abstract set of zones of a well-formed Euler diagram forms a lattice and not every lattice corresponds to a set of zones of a well-formed Euler diagram.

#### **5 Sets of Zones as Well-Formed Euler Diagrams and Lattices**

In this section a different construction is used for the formal contexts compared to the previous section. For each Euler diagram, a formal context (*E*(*L*)*, L,*) is created by taking the set *E*(*L*) of the set of labels of each zone as formal objects, the set *L* of set labels as formal attributes and by defining the incidence relation for *<sup>z</sup>* <sup>∈</sup> *<sup>E</sup>*(*L*)*, l* <sup>∈</sup> *<sup>L</sup>* as follows: *<sup>z</sup> <sup>l</sup>* :⇐⇒ *<sup>l</sup>* is an element of the set *<sup>z</sup>* of labels. Graphically this is equivalent to z (as a zone) being within curve *<sup>l</sup>*. Contrary to the construction of (*U, L,*∈) in the previous section this construction uses zones represented by labels without specifying elements of the sets.

The question arises as to whether any given set of zones *E*(*L*) can be represented as a well-formed Euler diagram or a Hasse diagram of a concept lattice. Obviously, the condition for being representable as a concept lattice is that the set of zones must form a lattice. This means that in the context (*E*(*L*)*, L,*) the set *<sup>E</sup>*(*L*) must be closed with respect to intersections. If the set of zones itself does not form a lattice, it can still be embedded into a lattice. Constructing a concept lattice for a context (*E*(*L*)*, L,*) achieves such an embedding. In the remainder of this paper, any concept in the lattice of (*E*(*L*)*, L,*) that is added for the embedding (i.e. does not have an immediate object in the lattice of (*E*(*L*)*, L,*)) is represented by an empty node in the Hasse diagram and called a *supplemental concept*. Because it does not correspond to a zone and thus does not have an immediate object, the extension of a supplemental concept equals the union of the extensions of its lower neighbouring concepts. If the bottom concept is supplemental (as in Lattice 3 in Fig. 5), its extension is empty.

**Fig. 5.** Euler diagrams and concept lattices

Diagram 1 (in Fig. 4) presents a Venn diagram that is also a well-formed Euler diagram and can be represented as a lattice without supplemental concepts (cf. Lattice 1). Lattice 3 and Diagram 3 (in Fig. 5) represent a set of zones which can neither be a wellformed Euler diagram nor a lattice without supplemental concepts. The set of zones in Lattice 4 forms a lattice without supplemental concepts, but does not correspond to a well-formed Euler diagram (because it contradicts the single-label condition). Last but not least, Diagram 5 displays an example of a well-formed Euler diagram which does not correspond to a lattice without supplemental concepts. Thus the examples show that any of the four possible constellations of being a well-formed Euler diagram and a lattice without supplemental concepts exists.

Supplemental concepts can occur higher up in the lattice ordering as well. Lattice 6 contains a supplemental concept which is required in order to attach attribute *Y* to a node but this node does not correspond to a zone in Diagram 6. Lattice 6 would still be a lattice even if the supplemental concept was removed. But in that case instead of a curve *Y* , two curves would need to exist, one as a subset of *X* and the other one as a subset of *Z*. Thus a corresponding lattice without a supplemental concept would have one attribute more than Lattice 6. Its corresponding Euler diagram would not be well-formed because the single-label condition would not be fulfilled.

It should be mentioned that adding or deleting a curve can change a well-formed Euler diagram into a non-well-formed one and vice versa. Diagram 6 can be embedded into a well-formed Euler diagram by adding a curve as shown in Diagram 6a. Similarly in Diagram 2, deleting curve *W* or *X* would yield a non-well-formed diagram which, in this case however, can be transformed into a well-formed diagram. For the purposes of this paper this fact about Euler diagrams is stated as the set of well-formed Euler diagrams not being *closed with respect to recursive generation*.

#### **6 Conditions for Well-Formed Euler Diagrams**

It appears to be easier to identify conditions that determine that a set of zones cannot be a well-formed Euler diagram than those that determine that it can be a well-formed Euler diagram. Such conditions from the literature (see below) tend to not use lattice theory. Therefore this section discusses some conditions based on lattice theory.


Condition C2 is relevant for Lattice 4 and the discussion about Lattice 6 above. Because of condition C2, lattices without supplemental concepts that correspond to well-formed Euler diagrams look like they are hypercubes that are glued together. But this is still not a necessary and sufficient condition. Lattice 9 in Fig. 6 does not correspond to a well-formed Euler diagram because the zone {*X,W*} which is shaded in black is disconnected.

A next attempt might be to consider whether distributivity plays a role but Fig. 6 demonstrates that it does not. Lattice 7 is not distributive but Diagram 7 is well-formed. Lattices 8–10 are distributive. Lattices 8 and 10 can be represented as well-formed Euler diagrams (as shown in Diagrams 8 and 10) but Lattice 9 cannot. In the case of a single disconnected zone as in Diagram 9, adding a further zone yields a well-formed diagram as demonstrated for Diagram 6 and 6a, Diagram 9 and 10 and Diagram 5 (modified to correspond to a lattice without supplemental concepts) and Diagram 7. Each represents

**Fig. 6.** Euler diagrams and distributive lattices

an example of the set of well-formed Euler diagrams not being closed with respect to recursive generation.

The fact that Lattices 4 and 9 cannot be represented as well-formed Euler diagrams can also be described in terms of irreducible concepts (i.e. concepts that are simultaneously ∧- and ∨-irreducible): in Lattice 4 a meet of three irreducible concepts contradicts the single-label condition and in Lattice 9 two irreducible concepts that have pairwise meets with the same concept and a joined meet cause a similar problem. It might also be of interest to consider well-formed Euler diagrams for lattices such as Lattice 5 where only the bottom concept is a supplemental concept.

Flower et al. (2008) provide further necessary conditions for well-formed Euler diagrams, for example a connectivity condition: a dual graph serves the connectivity condition if it is connected, all subgraphs induced by deleting any vertex containing a selected label are connected and all subgraphs induced by deleting any vertex not containing a selected label are also connected. If the bottom node was missing in Lattice 6, then its graph would be disconnected after removal of all concepts which do not contain *Y* . Thus attributes attached to supplemental concepts can be necessary, but not sufficient for the connectivity condition.

A further condition from Flower et al. (2008) is that the dual graph of an Euler diagram must be planar or must be reducible to a planar graph which still passes the connectivity condition. This does not imply that the corresponding Hasse diagram also must be planar because a Hasse diagram is a directed graph whereas a dual graph is undirected. For example Lattice 1 in Fig. 4 is not planar and cannot be converted into a planar Hasse diagram. But if the graph is converted into an undirected graph and the top node (or the bottom node) is placed into the middle then it can be drawn as a planar graph. The same holds for Lattice 2 without the dotted lines. Again, the negation is not valid: Lattice 9 shows an example that fulfils the single-label condition, the connectivity condition and is a planar Hasse diagram but is not drawable as a well-formed Euler diagram. Flower et al. remaining condition is a "face condition" which checks the sequence of curve labels around each "face" of a dual graph for a certain property. It is not clear whether and how that could be translated into a lattice-theoretical property.

#### **7 Reading Implications from Euler and Hasse Diagrams**

The question of which Euler diagrams can be drawn as well-formed diagrams is important because well-formed diagrams are presumably easier for users to visually parse than non-well-formed diagrams. A further question about Euler diagrams is what information can be extracted from them so that they can be employed as a tool for information visualisation. In Sect. 3 it was mentioned that implications can be read from concept lattices. The same is true for Euler diagrams. For example, one can read *<sup>X</sup>* <sup>=</sup><sup>⇒</sup> *<sup>Y</sup>* and *<sup>Y</sup>* <sup>=</sup><sup>⇒</sup> *<sup>Z</sup>* both from Diagram 11 as well as from Lattice 11 (in Fig. 7).

Stapleton et al. (2017) use Diagram 12 as an example of an *observational advantage* of Euler diagrams. The diagram shows that *<sup>P</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> ∅ ⇒ *<sup>R</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> <sup>∅</sup>. Stapleton et al. argue that Euler diagrams have a maximum observational advantage because any similar set-theoretical statement that is valid for the data in the diagram can be read from the diagram. We argue that Hasse diagrams have an even higher observational advantage than Euler diagrams if one considers further set-theoretical operations.

The implication *<sup>P</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> ∅ ⇒ *<sup>R</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> <sup>∅</sup> can also be observed from Lattices 12a and 12b2. Lattices 12a and 12b both contain the implication *<sup>R</sup>* <sup>⇒</sup> *<sup>P</sup>* and the corresponding *<sup>R</sup>*∩*<sup>Q</sup>* <sup>⊆</sup> *<sup>P</sup>* <sup>∩</sup>*Q*. Lattice 12a also contains *<sup>P</sup>* <sup>∩</sup>*<sup>Q</sup>* <sup>⇒</sup> *<sup>R</sup>* and thus *<sup>P</sup>* <sup>∩</sup>*<sup>Q</sup>* <sup>=</sup> *<sup>R</sup>*∩*<sup>Q</sup>* which is difficult, or impossible, to see in Diagram 12 because it involves a statement about the empty set as a bottom concept which exists in Lattice 12a but is a missing zone in Diagram 12. Lattice 12b contains all intersections that are still possible if the implication *R* ⇒ *P* is assumed. The supplemental concepts in Lattice 12b correspond to two missing zones in Diagram 12. In Lattice 12b, the implication *<sup>P</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> ∅ ⇒ *<sup>R</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> <sup>∅</sup> is not an intensional implication but an implication that involves *R* ∩ *Q* ⊆ *P* ∩ *Q* and the extensional information that *<sup>P</sup>* <sup>∩</sup> *<sup>Q</sup>* <sup>=</sup> <sup>∅</sup>.

While it is possible to observe that zones are missing in an Euler diagram, one can argue that statements that assert that two missing zones are equal (as in Lattice 12a) or involve information about extensions (as in Lattice 12b) cannot be observed from Euler diagrams. Thus one might argue that for someone who can read Hasse diagrams,

<sup>2</sup> It should be noted that implications and their generalisations are well-known in the FCA community and discussed, for example, in the textbook by Ganter and Obiedkov (2016).

Lattices 12a and 12b have a higher observational advantage than Diagram 12. Furthermore, Hasse diagrams are not restricted to representing simple relationships amongst sets. Ganter and Obiedkov (2016) discuss many other applications, for example, involving clauses and other more complex logical statements instead of just implications.

It should be mentioned, however, that lattices have the same problem as Venn and Euler diagrams in that they become very difficult to visually parse if they are too large. In cases such as Lattice 4 in Fig. 5 where "many intersections are missing", the lattice is less complex than a Boolean lattice. But in cases such as Lattice 3, a Boolean lattice is required. While it is theoretically possible to draw Hasse diagrams for Boolean lattices of any size, it becomes difficult to see anything in such a lattice for more than 4 sets. Therefore presenting diagrams to users is not necessarily the main goal of FCA applications which instead often use FCA for computational purposes.

**Fig. 7.** Implications amongst set-theoretical statements

### **8 Conclusion**

This paper provides a discussion of representing sets with Hasse diagrams of concept lattices compared to Euler diagrams. The basic relationship between the two types of diagrams is explained. Examples of well-formed Euler diagrams exist that do not correspond to lattices without supplemental concepts and lattices without supplemental concepts exist that do not correspond to well-formed Euler diagrams. Conditions for determining which Euler diagrams can be represented as concept lattices without supplemental concepts are discussed. While some Euler diagrams that are not well-formed can be near impossible to draw, having supplemental concepts in a lattice does not affect how a Hasse diagram is drawn or read. Supplemental concepts serve a purpose with respect to implications. Both Euler diagrams and Hasse diagrams become difficult to read if they get too large. While many people find Euler diagrams much more intuitive to read than Hasse diagrams, the overall expressive power of Hasse diagrams might be higher than that of Euler diagrams. Furthermore, lattice theory can quite likely provide more insights with respect to a theory of well-formed Euler diagrams.

One potentially provocative conclusion of this paper is that well-formed Euler diagrams may not actually be an ideal representation for sets. Set theory is often introduced to students using the visualisation of Venn and Euler diagrams. Thus students may start to think of sets as *being like* Venn and Euler diagrams. But because well-formed Euler diagrams can only represent some subsets of powersets and because it is not clear what the algebraic nature of well-formed Euler diagrams precisely is, one could argue that in some sense Hasse diagrams are more suitable for representing set theory than Euler diagrams.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Usage Patterns Identification Using Graphs and Machine Learning**

Ovidiu-Dan Sonea(B)

Babes-Bolyai University, Cluj-Napoca, Romania ovidiu.dan.sonea@gmail.com

**Abstract.** During the past years, the number of platforms that are introducing a subscription plan is steadily increasing. This phenomenon helps support the developers as well as continuing to provide quality content. Since not so many individuals are willing to spend money or some simply do not have the means, they resort to sharing an account that has a subscription plan. This behavior can, in some instances, be harmful for the developers and, even if it is not, any provider can benefit from knowing what type of clients they have. The solution depicted and explored in this article will focus on using data that is easily available and structuring it in a way that can provide insight into each account activity.

## **1 Introduction**

Since sharing credentials is very easy and many people don't see it as a problem, this practice continues to expand. This phenomenon leads many content creators to be interested in developing a way of identifying shared accounts but often it is not enough just to know if an account is shared or not; content creators want to know how an account is shared. This means that the algorithm must also classify users into patterns that are predefined by the provider to suite their needs. In the end, based on the constrains of each pattern and other metrics calculated, mostly using graph theory, the algorithm provides a sharing probability for each account. Most providers have access to massive amounts of data which, most likely, means that they have the necessary tools to identify password sharing, they just have not found an efficient way processing at the data in order to solve this problem. Given that the algorithm, which will be detailed in the next pages, uses only information that most content providers already have access to, it can be easily implemented successfully on a large number of platforms.

Although there are several solutions that are implemented, these approaches cannot provide a definitive answer for the problem previously described. A few examples are:


It is necessary to detect usage patterns. Having only the label "shared" or "not shared" is insufficient, because the content creator may want to allow certain types of sharing that do not harm their business. An example from the streaming industry can be a teenager that went to college and is sharing an account with his/her parents.

## **2 The Problem**

The problems that demand a solution are identifying accounts that are shared and classifying users into usage patterns. The end goal is to give providers insights on their subscribers so they can take action on the users from a certain usage pattern. The solution should also be implemented in a reliable and testable way.

## **3 Approach**

In solving this problem we used several graph theory algorithms to structure the data in a way that ensures the validity of our assumptions and assures that the data was not corrupted in a prior step.

The proposed solution will analyse the subscriber's activity during a given time period, classify the subscriber into a known usage pattern and it will provide a password sharing probability. Moving forward, we will describe the capabilities and functions of the proposed solution on an example from the TV industry.

The raw data, that will be inserted in the algorithm, observes the activity of real users during a month. This data was collected and provided by a client from the industry and contains the following fields: user id (as defined in the clients database), the coordinates at which an event took place, device type, device id and the time at which an event occured. All the data gathered in a time interval will be processed and the end result for each user will be:


Based on the detailed criteria, each subscriber will be labeled into a single usage pattern and after that, they will be given an account sharing probability. The table that follows shows how this particular client has chosen to define the patters in order to extract information they considered valuable. The algorithm allows for the patterns to be defined in



many ways without them affecting its performance. There are, however, a few limitations when it comes to defining these usage patterns. The limitations are: the defined patterns must be mutually exclusive, meaning that a subscriber must fit into only one pattern and that the entire pool of subscribers must be fitted into the defined patterns (there can't be subscribers that don't have a pattern assigned).

## **4 Implementation**

The algorithm is structured in way in which it achieves the intended goal by following nine steps (Table 2).

**Step 1.** This step mainly deals with the input processing by reading the raw data from the specified time period and filtering out inputs that might not be relevant for the algorithm.

**Step 2.** After the input is validated, the data must be arranged in a meaningful way for it to provide the desired results. This means grouping entries by users and sorting them in chronological order. For the algorithm to be more efficient, this step also deals with data compression. Meaning that, if there are multiple consecutive entries from the same device, they will be considered as being a single event with the starting date of the most recent and the end date of the last entry from the consecutive sequence.

**Step 3.** A square matrix is created with the size being the number of devices squared. This matrix represents a way to track which devices are being used by different persons. If such a case is found, the values in the matrix corresponding to the found devices will be marked with "1". Additionally, we create a buffer that contains events which span at most 48 h (we assumed that in this time period you can physically get to any two points in the United States). In this buffer, we recreate the activity of the subscriber by adding each event from the chronological event array, one by one, and check if it is physically possible. We have two ways of analysing if the activity is done by one person or more. The first one is by looking at two consecutive events and the second one is by checking three or more events (maximum is determined by the number of events in buffer) and analysing them with a machine learning algorithm (we used XGBoost Classifier with the objective of logistic regression). Choosing which consecutive events are analysed (and how) is a challenge by itself, since there can be multiple occurrences of the same device in buffer. To solve this, we created an occurrence array in which we store the last occurrence of each device. If a device that is already in the buffer is added again, then we analyse with a XGBoost algorithm the loop created by the two devices, as well as the whole buffer. For an example please look at Fig. 1.

**Step 4.** At this point, we can start calculating the minimum number of persons that is needed for the subscribers activity to be physically possible. Having the matrix from Step 3, we can consider it to be a graph represented as a matrix where we know that the values of one indicate that the devices corresponding to the line and the column


**Table 2.** List of used terms.

#### 88 O.-D. Sonea

**Fig. 1.** Graphical representation of Step 3

are used by different persons. At the same time, in graph theory, we can say that these two vertices are adjacent. At this point, the problem can be solved by a simple Graph Coloring algorithm [1]. To ensure that an optimal solution is found, we need to apply the algorithm from each vertex since this problem is NP-complete. From an efficiency point of view, this would seem extreme and inefficient, but we deal with small graphs, and in our experiments we had no issues. On average, in the data we had, the number of devices per account was around four, and only in extreme cases the count exceeded thirty. The end result will contain all the optimal solutions of coloring the graph, since, in most cases, it is not just one. Translating from graph theory, this means we found the minimum number of persons and all the possible ways of pairing a person with one or more devices.

**Step 5.** In this part of the algorithm, we implemented a method that is able to quantify how connected is the activity of a user. This quantifier is represented by the number of clusters, a term which was previously explained. Since we know the locations visited by a device, we can consider each device as being a graph and the visited locations as the vertices of this graph. Now, we have multiple graphs with common vertices but we don't know which of these vertices are mutual. If two graphs have a common vertex, that means they can be considered as one big graph. In the end, each remaining graph translates to a cluster. To solve this problem in an efficient way, we devised an algorithm based on a balanced binary search tree [2]. The information, contained in the nodes that create the tree, represents the location (which serves as a search key) and the device id (which is unique for an account). We add, one by one, all the locations that were visited by a user and if that location already exists in the tree, we know which device was already seen there. By having an array where we keep track of such cases, in the end, we can determine all the clusters. The previously mentioned array has the size equal to the number of devices. Each value in the array represents the index of a cluster in which a device is positioned.

**Step 6.** Each device has a degree of mobility, these degrees being "mobile" or "static", based on the type of the device. Using the result from Step 5 we can determine which cluster is mobile and which one is static. A static cluster has at least one static device and a mobile cluster does not have any static devices. If a subscriber has two or more distinct static clusters, we can safely label this account as being shared. Having the processed clusters at this step, we can also calculate the minimum distance between all clusters. To do this, we have to find the closest locations between each two clusters and after that, apply the Dijkstra algorithm [5] to create a minimum spanning tree. The sum of all remaining edges represents the minimum distance between all clusters.

**Step 7.** By using the results from Step 4 and 5, we can find distinct persons belonging to one or more clusters that have not visited other clusters and have never been in contact with the persons belonging to those clusters. In this instance, we can safely assume that we detected account sharing but, because Step 4 does not always return a single solution, we must check that the number of cases where we identified account sharing, divided to the total number of cases, is 1, before labeling an account as shared. The result of the division will be taken into account when calculating the sharing probability.

**Step 8.** Using the results from the steps above, we determined some thresholds that create a pattern and fit each subscriber in the corresponding usage pattern.

**Step 9.** In the end, using a machine learning algorithm (XGBoost Regressor with the objective of linear regression) a sharing probability is calculated. The algorithm takes into account the number of clusters, the number of devices, minimum number of persons, the usage pattern and the number obtained from Step 7.

#### **5 Technologies**

We decided to put XGBoost [4] at the core of this algorithm since it is very efficient, flexible, and it can learn really quickly. This was highly important because we didn't had any pre-labeled data and creating multiple thousands of repetitive entries in order to train a neural network would have been really difficult and time consuming. Using this approach we only had to label about one thousand for each model. All mentioned factors make the implementation of these types of gradient boosted decision trees to be the perfect solution for this problem.

The model used at Step 3 has an XGBClassifier with a structure as displayed in Table 3(a). The end result for this model was achieving an accuracy of 91.87% for the


**Table 3.** XGBClassifier Structures

training data and an accuracy of 93.24% for the validation data, which means that there was no over fitting.

For the model at Step 9 we used XGBRegressor with the specifications shown in Table 3(b). The accuracy for the training data was 89.22% and for the validation data 93.11%. For both models, the evaluation metric used was area under the curve(auc) [3]. During the tests made to find an optimal model for these tasks, we obtained an accuracy close to 100% for the training data but, for the validation, the accuracy was much lower, meaning that the model just learned the results.

#### **6 Results and Analysis**

We had access to a large data set. The Figs. 2, 3 and 4 are created from a data set with more than 13 million subscribers. Each subscriber had one or more events, meaning that, at least for the situations it was tested for, the algorithm produces results that can be considered to reflect reality.

Looking at Fig. 2, we notice that most users are classified as having either less than 20%, either 100% sharing probability, meaning that the algorithm is fairly certain of it's prediction. This is very important, since it would be troublesome to predict a high probability to a user that is not sharing the account.

Observing Fig. 3, it is obvious how unbalanced the distribution of patterns is. However, looking at Fig. 4, this represents good news for the provider of this data since the patterns that have the highest number of users represent a low risk of sharing.

Overall, 10% of shared accounts may not seem as a large number. However, taking into consideration that these users share their account with at least another person, it means that, if all those who benefit from sharing would get a subscription, the number of subscribers would increase with at least 10%. From a marketing point of view, this is a considerable and very favorable percentage for providers.

**Fig. 3.** Pattern distribution where the bar index represents the pattern from Table 1

**Fig. 4.** Average sharing score for each pattern where the bar index represents the pattern from Table 1

## **7 Conclusion**

Looking at the results, we are satisfied with the overall performance since we found a way to identify account sharing in a reliable way. Not only this, but we can actually determine multiple types of sharing. Moreover, the implementation of this algorithm is simple and can be done by other providers since this type of data is easily available. Even though the presented solution does not identify all accounts which are being shared, we consider this to be a step in the right direction. With further research, we are confident that more usage patterns will emerge and as a consequence the number of shared accounts might increase.

**Acknowledgements.** I would like to thank Dr. Christian Sac˘ area for his kindness, help and time ˘ invested in making this article possible.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Collaborative Design and Manufacture: Information Structures for Team Formation and Coordination**

Iain Duncan Stalker1(B) and Nikolai Kazantsev2

<sup>1</sup> Institute of Management, University of Bolton, Bolton, UK IS4@bolton.ac.uk <sup>2</sup> Alliance Manchester Business School, The University of Manchester, Manchester, UK nikolai.kazantsev@manchester.ac.uk

**Abstract.** Our interest here lies in supporting important, but routine and timeconsuming activities that underpin success in highly distributed, collaborative design and manufacturing environments; and how information structuring can facilitate this. To that end, we present a simple, yet powerful approach to team formation, partner selection, scheduling and communication that employs a different approach to the task of matching candidates to opportunities or partners to requirements (matchmaking): traditionally, this is approached using either an idea of 'nearness' or 'best fit' (metric-based paradigms); or by finding a subtree within a tree (data structure) (tree traversal). Instead, we prefer concept lattices to establish notions of 'inclusion' or 'membership': essentially, a topological paradigm. While our approach is substantive, it can be used alongside traditional approaches and in this way one could harness the strengths of multiple paradigms.

**Keywords:** Concept lattices · Information structures · Team formation

## **1 Introduction**

The first couple of decades of the twenty-first century have seen many Original Equipment Manufacturers (OEMs) in high value industries, such as the automotive and aerospace sectors, significantly streamline their supply chains, developing strategic partnerships with a reduced number of Tier 1 Suppliers, devolving to them key responsibilities for procurement and management of other suppliers. OEMs were motivated primarily by a need to rationalise adminstrative burden and to promote agility in response to increasing technical complexity of products and ever shortening product lifecycles. As such, traditional hierarchies have evolved into much flatter organisational structures, where interaction is dynamic and opportunistic, membership fluid, and decision-making decentralised. Unfortunately, these flatter structures have led to more complex coordination and collaboration procedures, posing challenges for small- and medium-sized enterprises (SMEs) to enter contemporary supply chains. It would be beneficial for SMEs to find a means of increasing visibilities to promote inclusion in contemporary supply chains: one possibility is for them to form clusters of complementary expertise so that they leverage appropriate market opprtunities. Subsets of partners from a cluster would pool resources—according to availability, capacity and requirements—to form a short term, dynamic partnership (an *agile partnership*) to respond as a single entity to a specific business opportunity.

A typical opportunity is when an OEM publishes an invitation to tender for a technical system or module, e.g., an interior. Timely response to this by an agile partnership requires rapid coordination of product development activities such as preliminary conceptual design of appropriate subsystems and (conceptual) integration of the resulting specification, among (potential) partners from the cluster. Automated support could accelerate this coordination and improve response to opportunities. Thus, our motivating research question was: *How can we enable quick assembly and informed coordination of agile partnerships in highly distributed, dynamic manufacturing environments?*

Essentially, the problem of assembling an agile partnership is one of matchmaking: identifying requirements and locating suppliers to fulfil these. Traditionally, this is approached using either an idea of 'nearness' or 'best fit' (metric-based paradigms); or by finding a subtree within a tree (data structure) (tree traversal). Here, we present an approach that uses concept lattices and rests on notions of 'inclusion' or 'membership': essentially, a topological paradigm.

Our intention here is to present the approach and outline its applications; we defer a critical comparison with alternatives and discussion of how to integrate with traditional approaches to another work. The paper is structured as follows. In Sect. 2 we briefly summarise the initial and current research contexts for the work; in Sect. 3 we introduce key elements of the formal apparatus and we briefly outline our approach; and in Sect. 4 we provide some simplified examples. We close with some concluding remarks in Sect. 5.

### **2 Research Context**

The initial research context for the work here was in the automotive sector; in particular, working with SMES forming collaborative clusters known as *Networks of Automotive Excellence* (NoAEs) [5–7]. Membership of these NoAEs is fluid, with partners participating in a number of networks; interaction is dynamic and opportunistic, and decision-making is decentralised. Subsets of partners within an NoAE pool resources, forming short-term, dynamic alliances to respond as a single entity to opportunities in appropriate markets [5]. In these contexts, the responsibility of OEMs is shifting from purchasing and supplier management to brand positioning and design for assembly; convening such networks through Tier 1 Suppliers [6].

The current research context has enlarged to include Industry 4.0 initiatives in aerospace and related industries. DIGICOR (https://www.digicor-project.eu/) is developing a collaboration platform, tools, and services to facilitate the set up and coordination of a production network; these are informed by case specific governance tools and procedures for collaboration, knowledge protection, and security [2]. The platform aims to provide seamless connectivity to existing automation solutions, smart objects, and real-time data sources across the network; this will enable manufacturing companies and service providers to create and operate collaborative networks across the value chain. A key aims is to foster the integration of non-traditional, small, but innovative companies into the complex supply chain of large OEMs. DIGICOR governance rules aim to significantly reduce the burden of setting up collaborative networks and shorten the time to jointly respond to business opportunities.

## **3 Preliminaries**

We briefly introduce some of the formal apparatus underpinning our approach.

### **3.1 Formal Concept Analysis**

*Formal Concept Analysis* (FCA) [4] is a powerful, elegant method of analysis which identifies (conceptual) structures within data sets. The qualifier *formal* typically precedes many of the terms in the vocabulary of FCA to emphasise that these are mathematical notions, which do not necessarily reflect everyday use of the terms. We shall dispense with the qualifier here for convenience.

**Definition 1 (Context and Concept).** *A* context *is a triple* (*G*,*M*,*I*)*, where G is a set of* objects*, M is a set of* attributes *and I* ⊆ *G* × *M is an* incidence relation*. We write gIm for* (*g*,*m*) <sup>∈</sup> *I. Let A* <sup>⊆</sup> *G and B* <sup>⊆</sup> *M. Define A*- <sup>=</sup> {*<sup>m</sup>* <sup>∈</sup> *<sup>M</sup>* <sup>|</sup> *gIm*,∀*<sup>g</sup>* <sup>∈</sup> *<sup>A</sup>*}*, then A is the set of attributes shared by all objects in the set A. Similarly define B* <sup>=</sup> {*<sup>g</sup>* <sup>∈</sup> *<sup>G</sup>* <sup>|</sup> *gIm*,∀*<sup>m</sup>* <sup>∈</sup> *<sup>B</sup>*}*, then B is the set of all objects possessing the attributes in the set B. These maps are called* derivation operators*. A* concept *of the context* (*G*,*M*,*I*) *is a pair* (*A*,*B*)*, such that A*- = *B and A* = *B. The* extent *of the concept* (*A*,*B*) *is A and the* intent *is B.*

**Definition 2 (Concept Lattice).** *Denote the set of all concepts of a context B*(*G*,*M*,*I*)*, or simply B where the context is clear. Define a partial order,* ≤*, on B as follows:* (*A*1,*B*1) <sup>≤</sup> (*A*2,*B*2) <sup>⇔</sup> *<sup>A</sup>*<sup>1</sup> <sup>⊆</sup> *<sup>A</sup>*<sup>2</sup> <sup>⇔</sup> *<sup>B</sup>*<sup>1</sup> <sup>⊇</sup> *<sup>B</sup>*2*. Then* (*B*,≤) *is called the* associated complete lattice of concepts*, or simply* concept lattice*, of the context* (*G*,*M*,*I*)*.*


**Table 1.** A simple context for the planets; after [3].

**Fig. 1.** A concept lattice for the planets from Table 1; after [3].

We illustrate the basics of FCA through a simple example. Table 1 illustrates a simple context for the planets (objects) of the solar system, categorising these according to a number of attributes such as size, distance from the Sun and whether a planet has a moon. Consider the set {Mercury, Venus}. The attributes of this set are {Mercury, Venus}- <sup>=</sup> {size-small, distance-near, moon-no}. and the pair ({Mercury, Venus},{size-small, distance-near, moon-no}) is a concept of the simple context of Table 1, since {size-small, distance-near, moon-no} <sup>=</sup> {Mercury, Venus}. Now consider the set {Mercury, Venus, Earth, Mars}. The attributes of this set are {Mercury, Venus, Earth, Mars}- <sup>=</sup> {size-small, distance-near}. The pair ({Mercury, Venus, Earth, Mars},{size-small, distance-near}) is a concept of the simple context of Table 1. Moreover, since ({Mercury, Venus},{size-small, distance-near, moon-no}) <sup>≤</sup> ({Mercury, Venus, Earth, Mars},{size-small, distance-near}) the former is a subconcept of the latter.

We can provide pictorial representation of the concepts of our context and their interrelations using a Hasse diagram [3]; see Fig. 11. The concept lattice is read in the following way: objects accumulate from the bottom upwards; and attributes accumulate from the top downwards. For example, the concept at the node marked distance-near includes {size-small, distance-near} as attributes and {Me, V, E, Ma} as objects. The concept lattice for a given context provides a direct manner in which to identify whether a relationship exists between two given concepts; and further, clarifies the nature of this relationship. For example, the concept lattice for a given context allows us to identify the immediate subconcept (respectively, superconcept) of any two concepts of a given context.

<sup>1</sup> The node colourings provide useful information concerning filters and ideals [4] furnished by the tool used to produce this figure, *Concept Explorer* (http://sourceforge. net/projects/conexp). This information is additional to our current purposes, thus we do not discuss here.

#### **3.2 Galois Connection**

Once information about a domain is structured in concept lattices, we can use Galois Connections to interrelate different concept lattices, or even different concepts in the same lattice. A Galois Connection is a pair of "opposite" functions between two partially ordered sets, often powersets, which respects the orders in the sets [1].

**Definition 3 (Galois Connection).** *Let* (*X*,*<sup>X</sup>* ) *and* (*Y*,*<sup>Y</sup>* ) *be partially ordered sets. A* Galois Connection *between the two sets is a pair of maps* α : *X* → *Y and* γ : *Y* → *X such that, for all x* ∈ *X and y* ∈ *Y ,*

$$\alpha(\mathbf{x}) \sqsubseteq\_Y \mathbf{y} \Leftrightarrow \mathbf{x} \sqsubseteq\_X \mathfrak{Y}(\mathbf{y}) \tag{1}$$

*We denote the Galois Connection between X and Y by* (*X*,α,*Y*, γ)*.*

**Definition 4 (Closure Operator).** *Let* (*X*,*<sup>X</sup>* ) *be a partially ordered set. A closure operator on X is a map c* : *X* → *X, such that, for all x*,*y* ∈ *X, c is*


*Accordingly, any element x* <sup>∈</sup> *X is called* closed *if and only if x* <sup>=</sup> *<sup>c</sup>*(*x*)*. We refer to the structure which results from the application of a closure operator to a poset as a closure system or simply closure.*

Amongst the many interesting properties of a Galois Connection is that the consecutive application, the *composition*, of the two "opposite" functions constitutes a closure operator; that is it "collects" upwards, preserves the order and two applications produce the same effect as one.

**Lemma 1.** *Let* (*X*,α,*Y*, γ) *be a Galois Connection between two partially ordered sets,* (*X*,*<sup>X</sup>* ) *and* (*Y*,*<sup>Y</sup>* )*. Then (composing from left to right)* αγ : *X* → *X defines a closure operator on X and* γα: *Y* → *Y defines closure operator on Y.* (See *[3]* for proof.)

#### **3.3 Galois Connections and Concept Lattices**

Recall the derivation operators from Subsect. 3.1 used to establish a relation from sets of objects to sets of attributes (shared by these objects) and vice-versa. These can be thought of as; are in fact functions on the powersets of objects and attributes. The lemma below shows that these constitute a Galois Connection between the two powersets.

**Lemma 2.** *Let*(*G*,*M*,*I*) *be a context. Recall the derivation operators* - :℘(*G*)→℘(*M*) *and* : ℘(*M*) <sup>→</sup> ℘(*G*)*. Then* (℘(*G*), - ,℘(*M*), ) *is a Galois Connection between the posets* (℘(*G*),⊆) *and* (℘(*M*),⊇)*.* (See *[4]* for proof.)

Any powerset and its dual are complete lattices [3], so we have that (℘(*G*),⊆) and (℘(*M*),⊇) are complete lattices. Thus, we can combine the results of Lemmas <sup>1</sup> and 2. **Corollary 1.** *Let* (*G*,*M*,*I*) *be a context. Recall the derivation operators* - : ℘(*G*) <sup>→</sup> ℘(*M*) *and* : ℘(*M*) <sup>→</sup> ℘(*G*)*. Then* - : ℘(*G*) <sup>→</sup> ℘(*G*) *is a closure operator on* (℘(*G*),⊆) *and* - :℘(*M*) <sup>→</sup>℘(*M*) *is a closure operator on* (℘(*M*),⊇)*.*

*Remark 1* **(Notation).** We write simply ℘(*G*) for (℘(*G*),⊆), when it is clear that the partial order is the usual subset inclusion; and we write ℘(*M*)∂ for (℘(*M*),⊇). We denote the closures of these under the compositions of the derivation operators ℘(*G*) and℘(*M*) ∂ , respectively.

For a particular context, (*G*,*M*,*I*), the actions of these closure operators, on ℘(*M*)∂ and on ℘(*G*), generate the concept intents and extents, respectively. Moreover, the structures of the closure systems induced on ℘(*M*)∂ and ℘(*G*) are identical; and these structures are co-located in the concept lattice, *B*(*G*,*M*,*I*). Informally, we can think of the closure operators as removing redundancy:


#### **3.4 Observations**

FCA identifies those objects which are indistinguishable under a given incidence relation to a particular set of attributes: indistiguishable objects belong to the same element of the associated closure (and comprise the extent of the related concept). For example, Mercury and Venus are indistinguishable using the attributes of the context in Table 1; thus, they are identified as the same "element" in the closure.

Different attribute sets will give rise to different closures: in particular, subsets will give rise to substructures. In FCA, a context derived from another by considering only a subset of attributes (or objects) is called a *subcontext* [4]. For example, the lattice in Fig. 2 derives from a subcontext of Table 1 that considers only attributes for size and distance; and ignores presence or absence of a moon. Again, the lattice derives from closures on the associated power sets2. When we use the subset of attributes (from the subcontext), we see that Mercury and Venus are still indistinguishable from each other, but now, Venus is also indistinguishable from these; thus, they are identified as the same "element" in the new closure, which is a coarser system. Of course, all of the

<sup>2</sup> Furthermore, a Galois connection obtains between the concept lattice of the full context (full lattice) and the concept lattice of the subcontext (sub-lattice). Thus, the above sub-lattice is also a closure of the full lattice.

**Fig. 2.** Concept lattice for a reduced context of planets.

information of the sub-lattice is actually contained in the full lattice; however, identifying a subset attributes of interest and using these to project onto a sub-lattice makes the relationships and indistinguishable elements much clearer (as redundant information is removed); and indeed, more visible. This becomes more valuable as the number of objects and attributes increase. This means that for a given context, we can use subsets of attributes to explore more directly the interrelationships of objects from different perspectives; and visualise these. This is essentially what we do when we use lattices to match suppliers with requirements, coordinate meetings, etc., as illustrated in Sect. 4.

#### **4 Application**

We provide some (simplified) examples of how the approach can be applied in the aerospace industry to facilitate team formation for responses to invitations to tender (Subsect. 4.1), to coordinate meetings (Subsect. 4.2), to identify membership of project subgroups and identify key interactions of team members (Subsect. 4.3).

#### **4.1 Invitations to Tender**

An *Invitation to Tender* is a formal invitation made by an OEM to suppliers to make an offer, i.e., propose terms, for the supply of specific goods or services. Typically, an ordinary aerospace tender includes a statement of requirements that clarify expectations of suppliers, specify products and services needed and identify volumes, time frames and key dates. An OEM will only consider tenders from suitably qualified partnerships and demands will usually address:


**Fig. 3.** A concept lattice for suppliers

An OEM may also require that a submitting partnership has a working history (thus, that partners are trusted by each other and are not new entrants to the cluster).

It is a simple matter to construct a context that allows us to characterise the suppliers (objects) in our cluster using relevant descriptors (attributes); Fig. 3 shows a simplified context for ten suppliers (S1, ..., S10) characterised using attributes relating to these descriptors (Min-Turnover, Min-Capacity, Trusted, New, ASD 9100, ISO 16949, NAD-CAP, Proximal and Min-CSR). We read the lattice as usual: the white labels collect upwards and the grey labels collect downwards. Here, a "concept" indicates which suppliers fulfil various requirements captured by subsets/combinations of the attributes. This provides information which can be invaluable for coordinating for tenders. Moreover, manipulating the lattice and projecting onto sub-lattices according to different subsets can reveal which suppliers meet certain criteria more directly.

Suppose that the OEM requires that the partnership has a sound working history and has stipulated minimum capacity, NADCAP capability, locating within a specific proximity and appropriate CSR certification. By collapsing the full lattice to an appropriate sub-lattice (for subcontext of attributes: Min-Capacity, Trusted, NADCAP, Proximal and Min-CSR), see Fig. 4, we see immediately that only suppliers S1, S6 and S7 are suitable partners for the tender (from the current set). Of course, we have made a number of simplifications here: we have not, for example, considered whether the expertise of these three would be sufficient. It is more likely that the ten suppliers would be for a particular aspect of the tender, the same aspect, and that our projection onto a sub-lattice

**Fig. 4.** A concept lattice for suppliers

would be used to identify potential candidates. We may then select one from these three or ask the three to coordinate on that aspect of the tender: this would build redundancy into the supply chain, if the tender were accepted, thus fostering resilience.

#### **4.2 Coordinating Meetings**

Consider the following (extremely) simplified subset of interior features of a fuselage for an airliner: chairs, windows, vents, internal panels, lighting systems and (overhead) lockers. Table 2 combines these with a relevant (again, extremely simplified) subset of service providers—Paneller, HVAC Supplier, Upholsterer, Lighting Specialist, Fixture Systems Provider, Seating Specialist, and Specialist Glass provider—into a context which gives rise to the concept lattice in Fig. 5. Again, we read the lattice with white labels collecting upwards and grey labels accumulating downwards. Here, a "concept" indicates which suppliers associate, i.e. have an interest in or contribute expertise necessary for a particular feature or set of features. This provides information which can


**Table 2.** A simple context for a fuselage interior.

**Fig. 5.** A concept lattice for fuselage suppliers

be invaluable for arranging meetings. For example, we can infer from the node carrying the label "Windows" that the interests of the Specialist Glass Supplier and the Paneller coincide and that it is only these two that need to meet to finalise the relevant specifications. Thus, we know that we shall need to coordinate meetings between these two for purposes of discussing the Windows. We can also see that no single supplier needs to meet about every feature, as the lowest node in the lattice has no object (supplier) associated with it. Moreover, We can also see that no single feature requires the input of every supplier, as the highest node in the lattice has no attribute (feature) associated with it. Of course, we can draw these conclusions quite easily from the context; however, this would become increasingly difficult as a context enlarges.

#### **4.3 Project Subgroups**

As the complexity of a product or technical system increases, the more convenient it is to have formal subgroups working on different aspects of development. Projecting the full concept lattice of Fig. 5 directly onto the sub-lattices deriving from the subcontext generated for a particular feature makes directly clear those suppliers who must be part of the subgroup. For example, Fig. 6 shows directly who is needed for the Light System Project Subgroup.

Finally, selecting those features for relating to a specific supplier and projecting the full concept lattice of Fig. 5 directly onto the sub-lattice deriving from the relevant subcontext reveals essential interactions, specifically subgroups and meetings, from the perspective of that supplier. For example, Fig. 7 shows directly the interactions of the HVAC Supplier. Interestingly, every feature that requires the expertise of the HVAC Supplier requires that of both the Paneller and the Fixtures Provider; however, we *cannot* infer the converse.

**Fig. 6.** A concept lattice for the light system subgroup

**Fig. 7.** Meetings and subgroups for the HVAC supplier

## **5 Concluding Remarks**

We have reported on investigations into information structuring to facilitate the automation of important, but routine and time-consuming activities of agile partnerships operating within highly distributed, collaborative environments. These explorations are grounded in the domains of Networks of Automotive Excellence and Industry 4.0 initiatives in aerospace. We have outlined how a synthesis of mathematical notions provides a simple, yet powerful approach to facilitate *inter alia* partnership formation; the selection of working groups from partnerships; and the scheduling of workshops and subgroup meetings, identifying the subject matter for these. We believe that our approach is innovative in that we re-think the problem of matching candidates to opportunities or partners to requirements; we frame this as a topological notion of set membership rather than taking traditional metric-based or tree traversal approaches, which, while effective, can be computationally expensive and time-consuming. Our aim is not to challenge established methods; rather, our intention here has been to present the approach and outline its applications to provide food for thought and to stimulate discussion. Thus, we defer a comparison with alternatives and in-depth critique to another work.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Invited Additional Contributions**

## **Approximate Knowledge Graph Query Answering: From Ranking to Binary Classification**

Ruud van Bakel1,2 , Teodor Aleksiev1,3 , Daniel Daza1,2,4 , Dimitrios Alivanistos1,4 , and Michael Cochez1,4(B)

<sup>1</sup> Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands ruudvanbakel@yahoo.co.uk, {d.dazacruz,d.alivanistos,m.cochez}@vu.nl <sup>2</sup> University of Amsterdam, Amsterdam, The Netherlands <sup>3</sup> Leiden University, Leiden, The Netherlands aleksiev.teodord@gmail.com <sup>4</sup> Discovery Lab, Elsevier, Amsterdam, The Netherlands

https://discoverylab.ai

**Abstract.** Large, heterogeneous datasets are characterized by missing or even erroneous information. This is more evident when they are the product of community effort or automatic fact extraction methods from external sources, such as text. A special case of the aforementioned phenomenon can be seen in knowledge graphs, where this mostly appears in the form of missing or incorrect edges and nodes.

Structured querying on such incomplete graphs will result in incomplete sets of answers, even if the correct entities exist in the graph, since one or more edges needed to match the pattern are missing. To overcome this problem, several algorithms for approximate structured query answering have been proposed. Inspired by modern Information Retrieval metrics, these algorithms produce a ranking of all entities in the graph, and their performance is further evaluated based on how high in this ranking the correct answers appear.

In this work we take a critical look at this way of evaluation. We argue that performing a ranking-based evaluation is not sufficient to assess methods for complex query answering. To solve this, we introduce Message Passing Query Boxes (MPQB), which takes binary classification metrics back into use and shows the effect this has on the recently proposed query embedding method MPQE.

**Keywords:** Query answering · Geometric representation · Box embeddings · Approximation

#### **1 Introduction**

In many organizations, a vast amount of complex information is used in operations daily. This data is often stored in various databases or file systems while information can be retrieved using query languages and information retrieval techniques. During the past decade, several companies have started taking up knowledge graphs (KG) [10], as a way to represent heterogeneous data and make it useful for a large variety of applications [14]. To make said data accessible, various querying languages like SPARQL and Cypher have been developed. Such querying languages allow for accessing nodes in the graph, traversing them via specific relations, or retrieve nodes that match a specific pattern. At the core of these languages lie graph patterns. These patterns can be thought of as graph shaped structures where some nodes and edges can correspond to nodes existing in the graph, while others correspond to variables (with specific variable names). When a match for this pattern is found in the graph, the variables are bound and the appropriate values are returned as the result.

However, the performance of the previously described process is heavily dependent on the level of completeness in the graph.

To go in detail, completeness refers to whether it contains all the nodes and edges in the graph pattern, and has a binding for all variables. Having a single node or edge missing from the graph, which represents a comparatively small bit of information, results in missing answers. This phenomenon could be good, in case of an erroneous piece of information, or bad, in case of information missing from the graph.

In this paper, we focus on this issue, specifically the case of missing edges in the graph. Ideally, we would like a query system that can still give answers when the phenomenon described before applies. We would like to have *approximate query answering*.

One way to approach this, is by performing link prediction. In link prediction, one would try to predict missing links in the graph, by training a machine learning model on the known parts of it. While not trivial, it is possible to use the single link prediction mechanism to answer queries with missing links. Another way to approach this problem is by using the so-called query encoders. These encoders take a query as input and produce an embedding (a high dimensional vector representation) for it. This query embedding is later compared to learned embeddings for the entities in the graph. This machine learning system is optimised in such a way that entities close to the query embedding in vector space, are also its probable answers.

In this paper we focus on the analysis and evaluation of these systems. Typically, such systems return a series of candidate answers to the query, accompanied by a likelihood or distance from the query embedding in vector space. In the evaluation phase, this ranking is compared to, not a ground truth ranking, but rather the set of correct answers to the query. To do this, typical measures like hits@n (how many correct answers out of n) and mean reciprocal rank (MRR – what is the average reciprocal of the rank of correct answers) are used. While these measures are appropriate for information retrieval systems, they fall short when it comes to query systems. In the latter, the results are not ranked, but are rather the correct answer or not.

This is also reflected in how these measures are usually adapted by modifying them to filtered versions. In this case, measures like hits@n and MRR are computed such that true answers higher in the returned ranking are ignored when computing for example the rank for lower ranked entities.

We argue that we need to look into metrics that are not based on specific ranking of the results, but rather on a crisp set of results retrieved from these systems. A main argument for why this is necessary is that many downstream tasks using the aforementioned results need to get a finite set of answers from the knowledge graph, not just a ranked list of all possible entities. That is, we need a query engine that does not just act as a ranking system, but as a binary classifier: it must provide a set of entities that are answers to the query while all other entities are not. In this scenario, the evaluation would be the same as what has traditionally been used for classification problems, with measures such as precision and recall.

This paper is structured as follows: in Sect. 2, we provide an example for several algorithms used for approximate query answering. Then, in Sect. 3 we discuss how metrics for binary classification can provide additional insight on top of the metrics used for ranking. We end that section with a general direction on how this could be achieved in the existing systems using volumetric query embeddings. Sect. 4 details a first approach for solving this problem using axisaligned hyper-rectangles for these queries. We describe the MPQB model, a proof-of-concept, in the section after that. Finally, we provide a conclusion and future outlook.

This work is largely based on the Bachelor thesis works of Ruud van Bakel [3] and Teodor Aleksiev [1], who both worked under the supervision of Michael Cochez at the Vrije Universiteit Amsterdam.

#### **2 Approximate Query Answering on Knowledge Graphs**

We define a knowledge graph as a tuple <sup>G</sup> = (V, <sup>R</sup>, <sup>E</sup>), where <sup>V</sup> is a set of entities, <sup>R</sup> a set of relation types, and <sup>E</sup> a set of binary predicates of the form <sup>r</sup>(h, t) where <sup>r</sup> ∈ R and h, t ∈ V. Each binary predicate represents an edge of type <sup>r</sup> between the entities <sup>h</sup> and <sup>t</sup>, and thus we call <sup>E</sup> the set of edges in the knowledge graph.

A query on a KG looks for the set of entities that meet a particular condition, specified in terms of binary predicates whose arguments can be constants (i.e. entities in V), or variables. As an example, consider the following query (adapted from [4]): "Select all projects P, such that topic T is related to P, and both *Alice* and *Bob* work on T". In this query, the constants entities are *Alice* and *Bob*, and the variables are denoted as P and T. We can define such a query formally in terms of a conjunction of binary predicates, as follows:

$$q = P.\exists T, P:\ \text{related}(T, P) \land \text{works}.\text{on}(\text{Alice}, T) \land \text{works}.\text{on}(\text{Bob}, T). \tag{1}$$

More formally, we are interested in answering *conjunctive queries*, that have the following general form:

$$q = V\_t \exists V\_1, \dots, V\_m : r\_1(a\_1, b\_1) \land \dots \land r\_m(a\_m, b\_m), \tag{2}$$

In this notation, <sup>r</sup>*<sup>i</sup>* ∈ R, and <sup>a</sup>*<sup>i</sup>* and <sup>b</sup>*<sup>i</sup>* are constant entities in the KG, or variables from the set {V*t*, V1,...,V*m*}.

Recent works have proposed to use machine learning methods to answer such queries. These methods operate by learning a vector representation in a space R*<sup>d</sup>* for each entity and relation type. These representations are also known as *embeddings*, and we denote them as **<sup>e</sup>***<sup>v</sup>* for <sup>v</sup> ∈ V and **<sup>e</sup>***<sup>r</sup>* for <sup>r</sup> ∈ R. Similarly, these methods define a *query embedding function* φ (usually defined with some free parameters), that maps a query <sup>q</sup> to an embedding <sup>φ</sup>(q) = **<sup>q</sup>** <sup>∈</sup> <sup>R</sup>*d*.

Given a query embedding **q**, a score for every entity in the graph can be obtained via cosine similarity:

$$\text{score}(\mathbf{q}, \mathbf{e}\_v) = \frac{\mathbf{q}^\top \mathbf{e}\_v}{||\mathbf{q}|| ||\mathbf{e}\_v||}.$$

The entity and relation type embeddings, as well as any free parameters in the embedding function φ, are optimized via stochastic gradient descent on a specific loss function. Usually the loss is defined so that for a given embedding of a query, the cosine similarity is maximized with embeddings of entities that answer the query, and minimized for embeddings of entities sampled at random.

The dataset used for training consists of query-answer pairs mined from the graph. Once the procedure terminates, the function φ can be used to embed a query. The entities in the graph can then be ranked as potential answers, by computing the cosine similarity of all the entity embeddings and the embedding of the query.

Note that in contrast with classical approaches to query answering, such as the use of SPARQL in a graph database, this approach can return answers even if no entity in the graph matches exactly every condition in the query.

In the next sections we review the specifics of recently proposed methods, which consider particular geometries for embedding entities, relation types, and queries; as well as scoring functions.

**Fig. 1.** The query *<sup>q</sup>* <sup>=</sup> *P.*∃*T,P* : related(*T,P*) <sup>∧</sup> works on(Alice*, T*) <sup>∧</sup> works on(Bob*, T*) can be represented as a directed acyclic graph, where the leaves are constant entities, the intermediate node *T* is a variable, and *P* is the target entity. (Adapted from a figure in [4])

#### **2.1 GQE**

Conjunctive queries can be represented as a directed acyclic graph, where the leaf nodes are constant entities, any intermediate nodes are variables, and the root node is the target variable of the query. In this graph, the edges have labels that correspond to the relation type involved in a predicate.

We illustrate this in Fig. 1 for the example query introduced previously. In Graph Query Embedding (GQE) [9], the authors note that this graph can be employed to define a computation graph that starts with the embeddings of the entities at the leaves, and follows the structure of the query graph until the target node is reached.

GQE was one of the first models that defined a query embedding function to answer queries over KGs. The function relies on two different mechanisms, each of which handles paths and intersections, respectively. This requires generating a large dataset of queries with diverse shapes that incorporate paths and intersections.

#### **2.2 MPQE**

Graph Convolutional Networks (GCNs) [5,8,11] are an extension of neural networks to graph-structured data, that allow defining flexible operators for a variety of machine learning tasks on graphs. Relational Graph Convolutional Networks (R-GCNs) [17] are a special case that introduces a mechanism to deal with different relation types as they occur in KGs, and have been shown to be effective for tasks like link prediction and entity classification.

In MPQE [4], the authors note that a more general query embedding function can be defined in comparison with GQE, if an R-GCN is employed to map the query graph to an embedding. The generality stems from the fact that the R-GCN uses a general message-passing mechanism to embed the query, instead of relying on specific operators for paths and intersections.

#### **2.3 Query2Box**

Both GQE and MPQE embed a query as a single vector (i.e., a point in space). Query2Box [15] deviates from this idea and uses a box shape to represent a query. The method further narrows the allowed embedding shape to axis-aligned hyperrectangles. We will discuss more in Sect. 4 why that is beneficial. This method has several benefits, especially for conjunctive queries; for these queries, the answer set can be seen as the intersection of the answers to the conjuncts. Such an operation can be imagined with an embedded volume, but not with a vector embedding.

While this method would have made it possible to create a binary classifier, the model is not specifically trained, nor evaluated for multiple answers.

#### **2.4 Complex Query Decomposition**

Complex Query Decomposition (CQD) [2], is a recently proposed method for query answering based on using simple methods for 1-hop link prediction to answer more complex queries. In CQD, the link predictors used are DistMult [21] and ComplEx [20]. Such link predictors are more data efficient than the previous methods, since they only need to be trained with the set of observed triples. In contrast, to be effective the previous methods require mining millions of queries covering a wide range of structures.

In CQD, a complex query is decomposed in terms of its binary predicates. The link predictor is used to compute scores for each of them, and the scores are then aggregated with t-norms, which have been employed in the literature as continuous relaxations of the conjunction and disjunction operators [12,13,18].

CQD provides an answer to the query by providing a ranking of entities based on the maximization of the aggregated scores. Therefore, the evaluation procedure for CQD is the same as the previous methods.

## **3 From Ranking Metrics to Actual Answers**

As discussed above, there are merits to returning a hard answer set as opposed to returning a ranking. One way to obtain such binary classifications is to define a threshold within a ranking. As we will further describe in Sect. 4, one can create such a threshold by using shapes (e.g. axis aligned hyper-rectangles) for query embeddings.

#### **3.1 Closed-World Assumption**

Binary classification does introduce new challenges. One such challenge can be seen in the definition of a loss function that can act differently for entities within the set and entities not in the set. Since the knowledge graph may contain missing edges, the retrieved target set may be a subset of the ground truth. This in turn could result in entities being incorrectly used within the loss function (i.e. an incorrect closed-world assumption).

However, this is not necessarily problematic. We define T to be the ground truth target set of a query and T to be the retrieved target set (i.e. when directly querying the KG). Assuming the number of entities missing from T is considerably smaller than V−T , most entities that do not belong in T are also not answers to the query (i.e. not in T ). This means that if we sample a relatively small subset of the inverse found target set (V−T ) it will likely not contain entities that are also in T .

In the case where we need to be certain that our sample from V−T does not contain entities in T we could restrict our sampling process to entities which could never appear in T . This is possible for example, by sampling entities which are incompatible with the domain and range of specific relations in a query (e.g. house entities will never appear in a has sibling(a,b) relation). Potential downsides of such methods include a potential slow down during learning or a limit in the model's overall performance, as having very different entities in T and our sample from V−T could prevent our model from learning the differences between the two sets. On the other hand, if these two sets are very similar the model would be forced to uncover differences even when they are not very apparent. In fact, it is often good practise to use so-called "hard" negative samples, which are similar to entities in T . A better alternative for finding entities not in T would be using more advanced techniques as proposed in [16].

#### **3.2 From Ranking to Classification**

Another focal point where binary classification differs from ranking as a metric, is in the way performance is measured (e.g. F-score against Mean Reciprocal Rank). On binary classification, a common performance measure would be the F-score, which is the harmonic mean between Precision and Recall, while in a ranking setting we encounter the Mean Reciprocal Rank.

While these metrics differ significantly, there are ways for them to relate. This insight can be evident, considering that rankings could be turned in binary classifications, using a threshold. In particular, we notice that ranking metrics typically focus on having entities in T higher in the rank. As a result, having many high-ranking entities that are not in T is also penalised. Effectively these measures then provide some notion of how well T and V−T can be separated. This means that in the case of a low ranking measure, the binary classification can also under-perform. Moreover, it could either result in low precision, recall or both, depending on where the threshold is placed among the ranking.

Geometrically, there is also a correspondence between a ranking with a cutoff point and a system where all answer embeddings withing a given distance would be included as answers. One could view a classifier with high precision and low recall as having an embedding with relatively small volume, while viewing a classifier with high recall and low precision as having an embedding with relatively large volume instead. In this setting, the interpretation of a ranking measure would be whether entities in T are closer to our geometric query embedding than entities not in T . This measure of closeness is defined via a distance metric (e.g. the L1 norm) and can be used in the loss function [15].

#### **4 Using Axis-Aligned Boxes for Query Embedding**

As discussed in Sect. 2 an entity is a valid answer to a specific structured query if it satisfies the query. The ultimate aim is to find the set of all valid answers, as entities in the Knowledge Graph, that satisfy the given query even when a missing edge in the KG is required for the binary predicates. As discussed, we could either attempt to use a cut-off point in the ranking to obtain a binary classifier, or we could train the embedding model such that it indicates a volume in the embedded space that contains the answers. In this section we present a first possible design of such a system to show the feasibility. We alter the earlier work done on query2box [15] method in two ways. First, we do interpret the boundaries of the hyperrectangle used for the embedding as a bounding box. All entities within the box are predicted answers to the query, while answers outside are predicted to not be answers. Second, we do not use the embedding procedure proposed in query2box, but rather perform the embedding using the technique devised in MPQE.

Now, we could choose to embed entities using points, as is done in other query embedding methods. Then, entities that get embedded inside the box would be seen as answers to the query, while points outside of it would be seen as non-answers. This is illustrated in Fig. 2.

**Fig. 2.** A small 2D query box embedding: Here there are three queries *A*, *B* and *C*, and two entities *v* and *w*. In this case *v* is an answer to *A* and *C*, whilst *w* is only an answer to *A*. (Source [3])

But, as we will discuss in more detail in the following subsection, we can also use hyper-rectangles for these. The choice we make in the experiments in this paper is to consider an entity, embedded as a box, to be valid answer to the query if there is an intersection between the two boxes. This is also illustrated in Fig. 3, for the two-dimensional case. An alternative choice could be to consider an entity and answer in case the entity box is completely inside the query box.

To formalize this, we operate on the embedding space R*<sup>d</sup>*. What we want is to describe an axis-aligned hyper-rectangle in this space. We do this by keeping two vectors, one to indicate the center of the box and one to indicate the offset of the sides of the box. So, in the described model every entity <sup>v</sup> <sup>∈</sup> <sup>V</sup> has an embedding **<sup>e</sup>***<sup>v</sup>* <sup>∈</sup> <sup>R</sup>2*<sup>d</sup>*. Additionally an embedding for the query is defined that maps the full vector of the query: **<sup>q</sup>** <sup>∈</sup> <sup>R</sup>2*<sup>d</sup>*.

The boxes in R*<sup>d</sup>* corresponding to the 2d-dimensional vectors are defined as <sup>p</sup> = (*Cen*(p), *Off* (p)) <sup>∈</sup> <sup>R</sup>2*<sup>d</sup>*:

$$Box\_p = \{ v \in \mathbb{R}^d : Cen(p) - Of(p) \preceq v \preceq Cen(p) + Of(p) \},\tag{3}$$

where denotes element-wise inequality.

**Fig. 3.** A small 2D query and entity box embedding: Here there are three queries *A*, *B* and *C*, and one entity *v*. In this case *v* is an answer to *A* and *B*, but not to *C*. (Source [3])

Note that a completely analog definition could be made by keeping two extreme counterpoints of the box rather than a center and offset.

#### **4.1 Boxes for Entities**

It was already mentioned in the previous section that we represent our entity embeddings with boxes, as well. This idea comes forward from the fact that entities could play different roles in different contexts. For example, we could have a person who both works at a university, buy is also a member of a political party. Having a single point to represent that person forces a query asking for members of that political party and a query asking for people working at that university to overlap. If we instead use a box for the entity, the query embeddings do not have that additional problem. The issue is also illustrated in Figs. 4 and 5. The nodes representing Alice and Bob are close to each other in the one context, but far away in the other one. In the embedding of the entities in Fig. 5 shows that with boxes it is possible to have the entities close to each other and far away from each other at the same time. With the entities as boxes, we can have it as an answer to two disjoint queries as illustrated in Fig. 3.

### **5 Proof of Concept**

In this section, we perform an evaluation of the system we discuss above. Note that our goal is not to provide state-of-the-art results. Firstly, this is because what we propose is just a proof of concept for an approximate embedding system which can find a set of answers for a query. But, the main reason we cannot really compare with other systems is because they are evaluated with ranking metrics as discussed in Sect. 3.

**Fig. 4.** Here Alice and Bob are closely related in context of a specific relations (1 relation minimum), but they are not very closely related in other context (5 hops minimum). (Source [3])

**Fig. 5.** Here Alice and Bob are have relatively close points (seen near the origin), but also very distant points. (Source [3])

**Fig. 6.** Used query structures for evaluation on query answering. Black nodes correspond to anchor entities, hollow nodes are the variables in the query, and the gray nodes represent the targets (answers) of the query. (Source [4])

#### **5.1 Experimental Setup**

Figure 6 shows seven distinct query graph structures. We only consider these structures when training and testing our model for the query answering task. These structures were originally proposed in GQE [9]. Each of these structures starts with actual entities from a graph (i.e. anchor entities) and ends with a set of target entities. Some of these structures are chains without any intersections (e.g. B.∃A, B : knows(Alice, A) <sup>∧</sup> is related to(A, B)), whilst other only have intersections (e.g. B.∃<sup>B</sup> : knows(Alice, B) <sup>∧</sup> is related to(Bob, B)) or even combinations of both. Our goal is to train a model that finds the answer set of a given query, using a query embedding. This is in contrast to other related work [4,9,15] as we want to be able to find multiple answers. As mentioned before, we could create such a set by embedding the query as box, thus getting a hard boundary for separating entities in and not in the target set.

**Datasets.** While previous work [4,9] incorporated multiple datasets, our implementation has yet solely been tested on the AIFB dataset. This dataset is a knowledge graph of academic institution in which persons, organizations, projects, publications, and topics are the entities. Table 1 give some statistics of this dataset and also for two more datasets often used for the evaluation of approximate query answering.


**Table 1.** Statistics of the knowledge graphs that were used for training and evaluation.

**Query Generation.** To train our model we have to sample for query graphs from our dataset. This is done by initially sampling anchor nodes and relations which are later used to form graphs based on specific query patterns (Fig. 6).

After acquiring the anchor nodes and the relations connecting them, we can obtain the target set. Although this may appear straightforward, there are some caveats. The biggest one is that some queries contain considerable sets of potential target entities (over 100,000 answers). Because we sample for edges first these particular graphs actually appear often.

Luckily, for most query structures this was not the case, but specifically the 2-chain and 3-chain query structures occasionally suffer from it. This is likely explained by the fact that knowledge graphs contain "hub nodes", nodes with a very high degree, to which a plethora of other nodes connect via a certain relation. Table 2 shows the average size of the target sets of sampled queries for the aforementioned datasets. One interesting thing to note is that for the AM dataset the 3-chain-inter structure actually had the largest average target set. This could indicate that this problem is indeed very graph-dependent. Since this is a problem with the AIFB dataset, we limit the query target sets to a maximum of 100 answers.

We also sample for entities not in the target set to be used as negative samples during training. For the query structures that contain an intersection we incorporate hard negative samples by finding entities that would have been in the target set if the conjunctive intersections were to be relaxed to disjunctions.


**Table 2.** Average number of multiple answers to different queries structures, across the used datasets. (Results were earlier reported in [1])

**Evaluation.** In order to test whether the model is actually able to find answers to queries that involve edges which are not in the graph, careful preparation of our data splits was necessary. We started by our original graph and marked 10% of the edges to be removed (they are still there at this stage). Then, we sample the graph for the query patterns. If the sample makes use of any edge marked as removed, it will be added to either the validation set or the test set (10/90 split). If the sample contains no such marked edge, then we put it in the training set. This way, we end up with validation and test queries that make use of at least one edge that is not in the graph seen during training.

Post sampling, we end up with around 2 million targets and the corresponding query graphs to be used in the training set. For the validation set we used about 30,000 targets worth of queries and for the test set we will had approximately 300,000 targets worth of query graphs. The validation set is also used to perform early stopping in case specific conditions were not met.

Since our method uses boxes, which allow for binary classification, we report our model's performance in the form of a confusion matrix (see Fig. 7). Given the fact that our entities are also boxes, we have more freedom to choose when an entity is considered an answer.

This is because entities now inhabit more space than a single point which allows for partial overlap with query boxes. In order to allow flexibility we have decided that an entity is considered an answer to a query if its box representation overlaps with the box representation of the respective query box. Naturally, other more strict conditions could be applied such as requiring full overlap or define a fraction based threshold (e.g. requiring at least 50% overlap). We expect these conditions to change based on the potential downstream task [22].

**Fig. 7.** Model of the confusion matrix used for evaluation of the results, the empty box is representation of a query, the black and the gray box are respectively a valid and a invalid answer to the query. (Source [1])

**Fig. 8.** The MPQB model used in this proof of concept. (Adapted from a figure in [3])

**Model.** Our model has the same basic functionality as the MPQE [4] model. MPQE is used as an embedding component, but the input and output are interpreted as boxes (as illustrated in Fig. 8). MPQE first performs several steps of message passing using an R-GCN architecture after which the node states are aggregated to form the query embedding. With this query embedding a loss function is evaluated which is used as a signal (using SGD) to update the embeddings and weights in the network. For the aggregation operation we have several options (*SUM*, *MAX*, *TM*, *MLP*) at the end of our model. We test our model with some of these different aggregation functions.

Since we train an embedding matrix (as opposed to having a latent embedding to start with) we need to initialize it. We do this by sampling the 32 dimensional center vectors from a uniform distribution between 0 and 10, whilst sampling the 32 dimensional offset vectors from a unit Gaussian with a mean 3.

For TM aggregation, the MPQE model uses 3 layers; the TM aggregation function requires a number of message passing steps equal to the query diameter, in our case 3. For the MLP aggregation function we applied a two layer fullyconnected MLP. As for the non-linearities in our model, we used the ReLU function. To update the parameters of the model we used Adam optimizer with a learning rate of 0.01.

Our code base is based on PyTorch. In particular, we made use of the library PyTorch Geometric [7], which is a PyTorch extension specialised for graph-based models. While there are potential baselines to consider [4,9], they are not suitable for our work. This happens because we perform a binary classification as opposed to ranking-based methods. To our knowledge there have not been any related work that performed binary classification in the context of approximate graph querying. In the area of link prediction, we do find some work, like the early work on Neural Tensor Networks [19] and a more recent one which looks at triple classification [6]. This did not prove to be a major concern, as our main goal was not to achieve state-of-the-art results, but rather explore whether this direction of research may prove worthwhile.

#### **5.2 Results**

After having trained the MPQB model for over 200,000 iterations it appeared to still not have converged. After this amount of iterations the query boxes seemed to not overlap with any target boxes (i.e. no entities in T were returned). Apart from training the model for longer and on multiple epochs, there are some other settings that could still be experimented with. For example, how many samples are in each epoch (less samples allow for training on more epochs), whether we use T fully during train or use a subset, and how many entities should be in our sample from V−T . The latter two settings also influence how many distinct queries we could train on within a given time span. In may be worth noting that previous works [4,9,15] train using single positive samples. While we want to focus on answering queries with multiple answers, we do not necessarily need to train on multiple answers. In theory, if a method can produce a good ranking, it should also be able to produce a good classification, given that the optimal thresholds for these rankings could be found.

Since we do not have direct result in a manner we would have liked, we will instead analyse the trained models to see if there are relevant insights to be found. For this we looked at models using different aggregation functions, trained on the AIFB dataset.

While we have no intersections between query boxes and target boxes, we could still look whether the target boxes (from T ) appear relatively close to the entity boxes, when compared to the box representations of entities in V−T . This effectively provides some measure as to whether the produced rankings are good. Table 3 shows these results. While these scores may not indicate state-of-the-art results, they do seem to suggest that the model did at least produce decent nontrivial rankings using the SUM and TM aggregators. This could suggest that further research is indeed in order. The fact that TM outperformed SUM is not surprising considering that it is a more involved method that also takes query diameter into account. This result is also in line with the findings in [4]. A more surprising result is that the MLP method did not seem to perform well at all. This could be a result of a faulty implementation, or an implementation that simply does not work for boxes as is. Overall, the results seem promising.


**Table 3.** Percentage (%) of answers embedded closer to the query box compared to a non answer, with regard to the query structure, using different aggregation function. Tested on AIFB dataset. (Results were earlier reported in [1])

#### **6 Conclusion and Outlook**

In this work, we looked critically at the currently prevailing evaluation strategy for approximate complex structured query algorithms for knowledge graphs. Typically, these systems take a query as an input and produce a ranking of all entities in the KG as an output. The performance of these systems is than determined using metrics typically used in information retrieval.

What we propose is to augment the current evaluations by also requiring these systems to produce a binary classification of the nodes into a class of answers and one of non-answers. This is needed because many applications can simply not work with a ranking and need a fixed set of answers to work with.

As a first proof of concept, we have adapted ideas from MPQE and query2Box, and created an embedding algorithm that represents the queries and the entities as axis-aligned hyper-rectangles. We noticed that the performance of this system is pretty low, and expect that future works can heavily improve upon this first attempt.

As future research directions, we see a need to expand our experiments to include other query types (disjunctions, negations, filters, etc.), in order to show the generalizability of our approach. This will, however, require new representation for the volumes as these operations are not possible if we would stay with just boxes. For example, the negation of a box, would no longer be a box.

Moreover, we it needs to be investigated how our method can be applied on different kinds of graphs. This will give us insights as to what changes need to be made in terms of training data (via query generation) as well as the effects on model performance. Also, it seems worth experimenting with different geometric representations for the parts of the query (anchor, variables and targets). Finally, since our experiments were relatively small-scale, further research could also start by simply experimenting with different settings for our current architecture.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Galois Connections for Patterns: An Algebra of Labelled Graphs**

David A. Cohen<sup>1</sup>, Martin C. Cooper2(B), Peter G. Jeavons<sup>3</sup>, and Stanislav Zivn´ ˇ y<sup>3</sup>

<sup>1</sup> Royal Holloway, University of London, Egham, UK dave@cs.rhul.ac.uk <sup>2</sup> IRIT, University of Toulouse, Toulouse, France cooper@irit.fr <sup>3</sup> University of Oxford, Oxford, UK {peter.jeavons,standa.zivny}@cs.ox.ac.uk

**Abstract.** A pattern is a generic instance of a binary constraint satisfaction problem (CSP) in which the compatibility of certain pairs of variable-value assignments may be unspecified. The notion of forbidden pattern has led to the discovery of several novel tractable classes for the CSP. However, for this field to come of age it is time for a theoretical study of the algebra of patterns. We present a Galois connection between lattices composed of sets of forbidden patterns and sets of generic instances, and investigate its consequences. We then extend patterns to augmented patterns and exhibit a similar Galois connection. Augmented patterns are a more powerful language than flat (i.e. nonaugmented) patterns, as we demonstrate by showing that, for any k <sup>≥</sup> 1, instances with tree-width bounded by k cannot be specified by forbidding a finite set of flat patterns but can be specified by a finite set of augmented patterns. A single finite set of augmented patterns can also describe the class of instances such that each instance has a weak nearunanimity polymorphism of arity k (thus covering all tractable language classes).We investigate the power of forbidding augmented patterns and discuss their potential for describing new tractable classes.

**Keywords:** Constraint satisfaction · Tractability · Forbidden patterns · Galois connection · Lattice

The authors were supported by EPSRC grant EP/L021226/1. Martin Cooper was supported by the grants ANR-18-CE40-0011 and ANR-19-PI3A-000. Stanislav Zivn´ ˇ y was supported by a Royal Society University Research Fellowship. This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 714532). The paper reflects only the authors' views and not the views of the ERC or the European Commission. The European Union is not liable for any use that may be made of the information contained therein.

#### **1 Introduction**

The CSP (Constraint Satisfaction Problem) is a classical abstract framework for the modelling of finite-domain constrained assignment problems [8,32]. Although first inspired by applications in computer vision and artificial intelligence, it's generic nature has allowed it to become a programming paradigm in its own right used in, for example, scheduling, product configuration, planning and bioinformatics. It is well known that the CSP is NP-complete and remains so even when restricted to binary constraints since all instances have an equivalent dual instance which is binary [22,40].

An interesting avenue of theoretical research on CSPs consists in the characterisation of tractable subproblems defined by placing a restriction on the type of constraints that can occur (the constraint language) and again it is known that it is possible to limit attention to languages of binary relations [5,10]. A major advance towards the recent characterisation of tractable constraint languages [3,41] was the algebraic approach based on the study of pointwise closure operations of constraint relations, known as polymorphisms, and the identities satisfied by these polymorphisms [1,4]. Of particular interest is the Galois connection between (sets of) polymorphisms and (sets of) relations [27]. In parallel, tractable subproblems of the CSP based on restrictions on the (hyper)-graph of constraint scopes (the constraint (hyper)graph) were also characterised [26].

In order to define new classes, we need to go beyond placing restrictions on constraint languages or on the structure of the constraint (hyper)-graph. A natural way of defining sets of instances is to consider properties of the microstructure of binary CSP instances [30]. A *pattern* can be seen as a partial microstructure (i.e. a binary CSP instance in which the compatibility of some assignments may be left undefined) or, more abstractly, as a graph with vertices labelled by names of variables and edges which may be positive or negative. Defining sets of binary CSP instances by forbidding patterns has led to the discovery of novel tractable classes [9,18]. For example, in each of the following cases, forbidding a simple 3 variable pattern defines a tractable class of binary CSP instances which strictly generalises a known tractable class:


In this paper we initiate the study of the underlying theory of forbidden (sets of) patterns, an essential foundation on which to build a characterisation of all tractable classes defined by forbidden (sets of) patterns. We begin by studying what we call flat patterns before studying augmented patterns with extra structure, such as partial orders on variables or domain values. Adding such structure is not only essential to define certain hybrid classes such as BTP [16] and EMC [19], but, as we will show in Sect. 6, also allows us to define (families of) polymorphisms [28] and bounded tree-width [20] within the same framework.

For both flat and augmented patterns, we exhibit a Galois connection between sets of patterns and sets of instances. In each case, we investigate the tractability consequences of the Galois connection, including the possibility of defining new tractable classes by combination of known tractable classes via the lattice operations. We notably show that tractable classes form a sublattice.

### **2 Definitions and Notation**

We assume that there is a countable collection of variables X and a countable domain D of values. A variable-value pair (x, a), representing the assignment of value a ∈ D to variable x ∈ X , is known as a *point*. A *flat pattern* (or simply a *pattern*) P = A*<sup>P</sup>* , ρ*<sup>P</sup>* is a subset A*<sup>P</sup>* of X ×D equipped with a (partial) function ρ*<sup>P</sup>* from the pairs of points (x, a),(y, b) of P such that x = y to {negative, positive}. Thus P consists of a set of variable-value assignments (x, a) together with a set of negative and positive edges representing the compatibility of pairs of assignments. In figures we represent negative edges by dashed lines, positive edges by solid lines and points corresponding to assignments to the same variable are grouped into ovals. Three patterns P1, P2, P3 are shown in Fig. 1.

**Fig. 1.** Examples of the occurrence of a pattern in another pattern: P<sup>1</sup> <sup>→</sup> P2, P<sup>2</sup> <sup>→</sup> P1, P<sup>1</sup> <sup>→</sup> P3, P<sup>2</sup> <sup>→</sup> P3.

We give a recursive definition of connectedness. Two points (x, a),(y, b) in a pattern P are *connected* if x = y or ρ*<sup>P</sup>* ((x, a),(y, b)) ∈ {negative, positive} or if (x, a),(y, b) are both connected to some point (z, c) of P. Clearly, each pattern has a decomposition into connected components according to this definition of connectedness.

A *completely specified binary CSP instance* (or simply an *instance*) is a pattern I = A*<sup>I</sup>* , ρ*<sup>I</sup>* in which the function ρ*<sup>I</sup>* is total, i.e. the compatibility of each pair of variable-value assignments (to distinct variables) is specified. Given an instance I on n variables, a *solution* to I is a clique of positive edges of size n, which corresponds to a pairwise-compatible assignment of values to variables. The question associated with an instance is the existence of a solution. An instance I is *arc consistent* if for all points (x, a) of I and all variables y = x of I, (x, a) has a support at y, i.e. ∃b ∈ D such that {(x, a),(y, b)} is a positive edge in I.

A pattern P = A*<sup>P</sup>* , ρ*<sup>P</sup> occurs* in pattern Q = A*Q*, ρ*<sup>Q</sup>* if there is a mapping f from A*<sup>P</sup>* to A*<sup>Q</sup>* which respects variables, maps negative edges to negative edges and positive edges to positive edges, i.e.


We use the notation <sup>P</sup> <sup>→</sup> <sup>Q</sup> to denote that <sup>P</sup> *occurs* in pattern <sup>Q</sup> (and <sup>P</sup> - Q if it does not). It is easy to see from its definition that occurrence is transitive: P → Q and Q → R implies P → R. We consider two patterns P, Q to be equivalent if P → Q and Q → P: we write P ≈ Q. For example, patterns P1 and P2 in Fig. 1 are equivalent; we notably have P1 → P2 since (x, a), (y, b) can both map to (z, c). Clearly, we have P2 → P3, and then, by transitivity, P1 → P3. For simplicity of presentation, throughout this paper, we will talk about patterns rather than equivalence classes of patterns.

Each pattern P defines a corresponding set of instances in which P does not occur. For example, the pattern P3 of Fig. 1 defines a set of instances which includes all binary CSP instances with Boolean domains, since if P3 → I then the points (v, d), (v, e), (v, f) must map to three distinct values for the same variable in I, due to the positive and negative edges in P3.

Note that in previous work, it has sometimes been convenient to assume that when P occurs in Q, distinct variables of P map to distinct variables of Q [11,15,19]. However, to establish a Galois connection for flat patterns, we require a looser definition of occurrence in which two or more variables of P may map to the same variable in Q. To impose the stricter definition of occurrence (inducing an injective mapping of variables of P), it suffices, for each pair of distinct variables x, y, to add two new points (x, a), (y, b) to A*<sup>P</sup>* and an extra dummy positive edge between points (x, a), (y, b) in P; this prevents x, y mapping to the same variable in Q (and only changes the semantics of P in a trivial way). A more elegant solution (in order to impose an injective mapping of variables) is to augment the patterns with a not-equal-to relation between variables which is possible in the framework of augmented patterns discussed in Sect. 6.

We consider sets S of patterns. These sets will usually be finite, indeed, often a singleton. When forbidden, a set S of patterns defines a set of instances (those sets of instances in which none of the patterns in S occurs). Such sets T of instances are hereditary in the sense that (I ∈ T) ∧ (I ⊆ I) =⇒ (I ∈ T), where I ⊆ I means (A*I*- ⊆ A*<sup>I</sup>* ) ∧ (ρ*I*- = ρ*<sup>I</sup>* |*AI*- ). Many, but not all, classes of interest are hereditary. For example, for any k, the set of instances whose tree-width is bounded by k is hereditary. On the other hand, the set of instances which is arc-consistent is not hereditary, since a value which has a support at another variable in an instance I will not necessarily have a support in I ⊂ I. Thus forbidden flat patterns alone cannot express any class of instances which requires arc consistency (or a higher level of consistency) [36]. Nevertheless, we will see in Sect. 6 how a combination of augmented patterns and filters on instances provides a very expressive language in which to define classes on instances, allowing us to express such classes of instances.

In order to obtain a Galois connection we consider sets of generic instances, where a generic instance can be viewed as a partially-specified instance and is, in fact, again just a pattern. However, the lattice structure on sets of patterns is different depending on whether we view these patterns as partially-specified instances or as forbidden sub-instances. When defining tractability of sets of generic instances we filter instances keeping only those that are completely specified.

**Definition 1.** *A set* T *of generic instances is* tractable *if there is a polynomialtime algorithm which decides all completely-specified instances in* T*. A set* S *of forbidden patterns is tractable if the corresponding set of instances in which none of the patterns in* S *occur is tractable.*

To define lattices of (sets of) instances and (sets of) patterns, we also require the following operation on patterns: if P, Q are patterns, then P + Q is a single pattern consisting of (copies of) the two patterns P and Q (without any common points and without any edges between P and Q). We call this the *juxtaposition* of the two patterns P and Q. Observe that P + P ≈ P (since P + P → P follows from the definition of occurrence which allows us to map the two copies of P to P). If S1, S<sup>2</sup> are sets of patterns, then S<sup>1</sup> + S<sup>2</sup> is the set of patterns {P + Q | P ∈ S<sup>1</sup> ∧ Q ∈ S2}.

We also require another operation on pairs of patterns, which can be seen as the greatest lower bound of the two patterns. If P, Q are patterns, then P × Q is a single pattern built by forming the juxtaposition of all patterns R such that (R → P)∧(R → Q). We say that such patterns R are *common factors* of P and Q. We only include patterns R which are maximal in the sense that there is no R ≈ R such that R → R and (R → P) ∧ (R → Q). Observe that including only maximal R, ensures that we have P ×P ≈ P. The operation × is illustrated in Fig. 2. In this example, the patterns P and Q have only two maximal common

**Fig. 2.** The operation P <sup>×</sup> Q.

factors (modulo the equivalence relation ≈) and P × Q is the juxtaposition of these two common factors. Note that P1 and P2 (shown in Fig. 1) are both common factors of P and Q, but since P1 ≈ P2 we only need to include one of these patterns in P × Q. If S1, S<sup>2</sup> are sets of patterns, then S<sup>1</sup> × S<sup>2</sup> is the set of patterns {P × Q | P ∈ S<sup>1</sup> ∧ Q ∈ S2}.

The following lemmas provide a logical interpretation of the + and × operations on patterns.

**Lemma 1.** *For all patterns* <sup>P</sup>1, P2, I*, we have* <sup>P</sup><sup>1</sup> <sup>+</sup>P<sup>2</sup> - <sup>I</sup> *if and only if* (P<sup>1</sup> - <sup>I</sup> <sup>∨</sup> <sup>P</sup><sup>2</sup> -I)

*Proof.* For all patterns P1, P2, I, P1+P<sup>2</sup> → I if and only if (P<sup>1</sup> → I ∧P<sup>2</sup> → I) by the definition of <sup>P</sup>1+P2. By contraposition, for all patterns <sup>P</sup>1, P2, I, <sup>P</sup>1+P<sup>2</sup> - I if and only if (P<sup>1</sup> - <sup>I</sup> <sup>∨</sup> <sup>P</sup><sup>2</sup> -I).

**Lemma 2.** *For all patterns* P, I1, I2*,* <sup>P</sup> - <sup>I</sup><sup>1</sup> <sup>×</sup> <sup>I</sup><sup>2</sup> *if and only if* (<sup>P</sup> - <sup>I</sup><sup>1</sup> <sup>∨</sup> <sup>P</sup> - I2)*.*

*Proof.* By contraposition, it suffices to show that P → I<sup>1</sup> × I<sup>2</sup> if and only if P → I<sup>1</sup> ∧ P → I2. If P → I<sup>1</sup> ∧ P → I2, then P is a common factor of I<sup>1</sup> and I<sup>2</sup> and hence P → I<sup>1</sup> × I2. On the other hand, if P → I<sup>1</sup> × I2, then, due to the lack of edges between the connected components of I<sup>1</sup> × I2, P must be the juxtaposition of patterns P1,...,P*<sup>r</sup>* where for each i = 1,...,r, P*<sup>i</sup>* → R*<sup>i</sup>* for some R*<sup>i</sup>* which is one of the connected components of I<sup>1</sup> × I2. Each connected component R*<sup>i</sup>* of I<sup>1</sup> × I<sup>2</sup> satisfies R*<sup>i</sup>* → R *<sup>i</sup>* for some common factor R *<sup>i</sup>* of I<sup>1</sup> and I2. By transitivity of the occurrence relation and by definition of I1×I2, we have P*<sup>i</sup>* → I<sup>1</sup> and P*<sup>i</sup>* → I<sup>2</sup> (for i = 1,...,r) and hence P → I<sup>1</sup> and P → I2.

### **3 The Two Lattices**

Let P be the set of all patterns and I be the set of all generic instances. Let T be the set of all subsets of I. Let S be the set of all subsets of P. In this section we show that S and T have lattice structures with partial orders based on notions of occurrence. Although P = I, S and T are distinct since they do not have the same partial order.

We require two different definitions of occurrence of one set of patterns in another, depending on whether the sets of patterns are considered as forbidden patterns or sets of generic instances. For S1, S<sup>2</sup> ∈ S, we write S<sup>1</sup> - S<sup>2</sup> to mean that ∀Q ∈ S2, ∃P ∈ S<sup>1</sup> such that P → Q. We write S<sup>1</sup> - S<sup>2</sup> if S<sup>1</sup> - S<sup>2</sup> and S<sup>2</sup> - S1. For T1, T<sup>2</sup> ∈ T , we write T<sup>1</sup> → T<sup>2</sup> to mean ∀P ∈ T1, ∃Q ∈ T<sup>2</sup> such that P → Q. We write T<sup>1</sup> ↔ T<sup>2</sup> if T<sup>1</sup> → T<sup>2</sup> and T<sup>2</sup> → T1. It follows directly from their definitions that and ↔ are equivalence relations.

Let T be the set of all equivalence classes (according to ↔) of sets of generic instances. Let S be the set of all equivalence classes (according to -) of sets of forbidden patterns.

It is not difficult to see that → is a partial order on T and that is a partial order on S. It follows that T and S both have a lattice structure [2,21]. The following proposition shows that the set T has a lattice structure with meet and join operations × and ∪, whereas the set S has a lattice structure with meet and join operations + and ∪.

**Proposition 1.** *For all* S1, S<sup>2</sup> ∈ S*, (1)* S<sup>2</sup> - S<sup>1</sup> ⇔ S<sup>1</sup> + S<sup>2</sup> - S<sup>1</sup> *and (2)* S<sup>2</sup> - S<sup>1</sup> ⇔ S<sup>1</sup> ∪ S<sup>2</sup> - S2*. For all* T1, T<sup>2</sup> ∈ T *, (3)* T<sup>1</sup> → T<sup>2</sup> ⇔ T<sup>1</sup> × T<sup>2</sup> ↔ T<sup>1</sup> *and (4)* T<sup>1</sup> → T<sup>2</sup> ⇔ T<sup>1</sup> ∪ T<sup>2</sup> ↔ T2*.*

*Proof.* (1) ⇒: S<sup>2</sup> - S<sup>1</sup> means ∀P ∈ S1, ∃Q ∈ S<sup>2</sup> such that Q → P and hence P + Q → P. Thus S<sup>1</sup> + S<sup>2</sup> - S1. Clearly S<sup>1</sup> -S<sup>1</sup> + S2.

(1) ⇐: S<sup>1</sup> + S<sup>2</sup> - S<sup>1</sup> means ∀P ∈ S1, ∃R + Q ∈ S<sup>1</sup> + S<sup>2</sup> such that R + Q → P which implies Q → P. Hence S<sup>2</sup> -S1.

(2) ⇒: S<sup>2</sup> - S<sup>1</sup> means ∀P ∈ S1, ∃Q ∈ S<sup>2</sup> such that Q → P. Now, since ∀Q, Q → Q, we have ∀R ∈ S<sup>1</sup> ∪ S2, ∃Q ∈ S<sup>2</sup> such that Q → R. Hence S<sup>2</sup> - S<sup>1</sup> ∪ S2. Clearly S<sup>1</sup> ∪ S<sup>2</sup> -S2.

(2) ⇐: S<sup>2</sup> - S<sup>1</sup> ∪ S<sup>2</sup> implies that ∀P ∈ S1, ∃Q ∈ S<sup>2</sup> such that Q → P and so S<sup>2</sup> -S1.

(3) T<sup>1</sup> → T<sup>2</sup> means that ∀I ∈ T1, ∃J ∈ T<sup>2</sup> such that I → J, so I is a common factor of I and J and hence I → I × J. Thus T<sup>1</sup> → T<sup>1</sup> × T2. Thus, by definition of ×, T<sup>1</sup> × T<sup>2</sup> → T1.

(3) ⇐: T<sup>1</sup> → T<sup>1</sup> × T<sup>2</sup> means that ∀I ∈ T1, ∃I × J ∈ T<sup>1</sup> × T<sup>2</sup> such that each connected component of I occurs in a common factor of I and J, and hence each connected component of I occurs in J and so I → J. Thus T<sup>1</sup> → T2.

(4) ⇒: T<sup>1</sup> → T<sup>2</sup> means ∀I ∈ T1, ∃J ∈ T<sup>2</sup> such that I → J. Thus T<sup>1</sup> ∪ T<sup>2</sup> → T2. Clearly T<sup>2</sup> → T<sup>1</sup> ∪ T2.

(4) ⇐: T<sup>1</sup> ∪ T<sup>2</sup> → T<sup>2</sup> implies ∀I ∈ T1, ∃J ∈ T<sup>2</sup> such that I → J which is exactly T<sup>1</sup> → T2.

The following lemmas are not essential for the lattice structure of S and T , but will be useful later.

**Lemma 3.** *If* S<sup>1</sup> ⊇ S<sup>2</sup> *then* S<sup>1</sup> -S2*. If* T<sup>1</sup> ⊆ T<sup>2</sup> *then* T<sup>1</sup> → T2*.*

*Proof.* If S<sup>1</sup> ⊇ S<sup>2</sup> then ∀Q ∈ S2, ∃P = Q ∈ S<sup>1</sup> such that P → Q. If T<sup>1</sup> ⊆ T<sup>2</sup> then ∀P ∈ T1, ∃Q = P ∈ T<sup>2</sup> such that P → Q.

**Lemma 4.** *For all sets of patterns* S1, S2*,* S<sup>1</sup> + S<sup>2</sup> - S<sup>1</sup> ∩ S<sup>2</sup> *and* S<sup>1</sup> ∩ S<sup>2</sup> → S<sup>1</sup> × S2*.*

*Proof.* We have ∀P ∈ S<sup>1</sup> ∩S2, P - P +P ∈ S<sup>1</sup> +S2. Hence S<sup>1</sup> +S<sup>2</sup> - S<sup>1</sup> ∩S2. Also ∀I ∈ S<sup>1</sup> ∩ S2, I ↔ I× ∈ S<sup>1</sup> × S2. Hence S<sup>1</sup> ∩ S<sup>2</sup> → S<sup>1</sup> × S2.

If we consider that S<sup>1</sup> ≤ S<sup>2</sup> if S<sup>2</sup> - S1, then the minimal element in the lattice S is the empty set of patterns and the maximal element is {P∅} where P<sup>∅</sup> is the pattern containing no points or edges. If we consider that T<sup>1</sup> ≤ T<sup>2</sup> if T<sup>1</sup> → T<sup>2</sup> then the minimal element of T is the empty set of patterns and the maximal element is the set of all patterns.

The two lattices S and T are both distributive, as shown by the following proposition.

**Proposition 2.** *For all* S1, S2, S<sup>3</sup> ∈ S*, we have* S<sup>1</sup> + (S<sup>2</sup> ∪ S3) - (S<sup>1</sup> + S2) ∪ (S1+S3) *and for all* T1, T2, T<sup>3</sup> ∈ T *, we have* T1∪(T2×T3) ↔ (T1×T2)∪(T1×T3)*.*

*Proof.* These follow immediately from the definitions.

## **4 The Galois Connection**

The Galois connection is based on two functions f : S→T and g : T →S, defined as follows.

$$\begin{aligned} f(S) &= \{ I \in \mathcal{T} \mid \forall P \in S, P \twoheadrightarrow I \} \\ g(T) &= \{ P \in \mathcal{P} \mid \forall I \in T, P \twoheadrightarrow I \} \end{aligned}$$

**Theorem 1.** *There is an antitone Galois connection between* S *and* T *.*

*Proof.* The functions f,g, applied to equivalence classes of S and T define a Galois connection between S and T if ∀S ∈ S, ∀T ∈ T , T ≤ f(S) ⇔ S ≤ g(T). This corresponds to (T → f(S)) ⇔ (g(T) - S), which holds because (T → f(S)) and (g(T) - <sup>S</sup>) are both equivalent to <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>∀</sup><sup>I</sup> <sup>∈</sup> <sup>T</sup>, <sup>P</sup> - I. We therefore have a Galois connection between S and T .

We now study this Galois connection in more detail.

**Proposition 3.** *For all* S1, S<sup>2</sup> ∈ S*, if* S<sup>1</sup> - S<sup>2</sup> *then* f(S1) ⊆ f(S2)*. For all* T1, T<sup>2</sup> ∈ T *, if* T<sup>1</sup> → T<sup>2</sup> *then* g(T2) ⊆ g(T1)*.*

*Proof.* Suppose S<sup>1</sup> - S2. Then ∀P<sup>2</sup> ∈ S2, ∃P<sup>1</sup> ∈ S<sup>1</sup> such that P<sup>1</sup> - P2. Consider <sup>I</sup> <sup>∈</sup> <sup>f</sup>(S1). By definition of <sup>f</sup>, <sup>∀</sup>P<sup>1</sup> <sup>∈</sup> <sup>S</sup>1, <sup>P</sup><sup>1</sup> - I. It follows that I ∈ f(S2) since otherwise we would have some P<sup>2</sup> ∈ S<sup>2</sup> such that P<sup>2</sup> → I and some P<sup>1</sup> ∈ S<sup>1</sup> with <sup>P</sup><sup>1</sup> <sup>→</sup> <sup>P</sup><sup>2</sup> <sup>→</sup> <sup>I</sup> which contradicts <sup>P</sup><sup>1</sup> -I.

Suppose T<sup>1</sup> → T2. Then ∀I<sup>1</sup> ∈ T1, ∃I<sup>2</sup> ∈ T<sup>2</sup> such that I<sup>1</sup> → I2. Consider <sup>P</sup> <sup>∈</sup> <sup>f</sup>(T2). By definition of <sup>g</sup>, <sup>∀</sup>I<sup>2</sup> <sup>∈</sup> <sup>T</sup>2, <sup>P</sup> - I2. It follows that P ∈ g(T1) since otherwise we would have some I<sup>1</sup> ∈ T<sup>1</sup> such that P → I<sup>1</sup> and some I<sup>2</sup> ∈ T<sup>2</sup> such that <sup>P</sup> <sup>→</sup> <sup>I</sup><sup>1</sup> <sup>→</sup> <sup>I</sup><sup>2</sup> which contradicts <sup>P</sup> -I2.

We immediately have the following corollary.

**Corollary 1.** *For all* S1, S<sup>2</sup> ∈ S*,* S<sup>1</sup> - S<sup>2</sup> ⇒ f(S1) → f(S2)*. For all* T1, T<sup>2</sup> ∈ T *,* T<sup>1</sup> → T<sup>2</sup> ⇒ g(T1) g(T2)*.*

**Proposition 4.** *For any patterns* S1, S2*,* f(S1) = f(S2) *if and only if* S<sup>1</sup> -S2*.*

*Proof.* Suppose <sup>f</sup>(S1) = <sup>f</sup>(S2). Then <sup>∀</sup>I, (∀<sup>P</sup> <sup>∈</sup> <sup>S</sup>1, P - <sup>I</sup>) <sup>⇔</sup> (∀<sup>P</sup> <sup>∈</sup> <sup>S</sup>2, P - I). This is equivalent to ∀I, (∃P ∈ S1, P → I) ⇔ (∃P ∈ S2, P → I). It follows, by setting I = P ∈ S2, that ∀P ∈ S2, ∃P ∈ S<sup>1</sup> such that P → P, and hence S<sup>1</sup> - S2. By setting I = P ∈ S1, by a symmetrical argument, we obtain S<sup>2</sup> - S1, and hence S<sup>1</sup> -S2.

Now suppose that S<sup>1</sup> - S2. Then, by Proposition 3, we can deduce that f(S1) = f(S2).

It is important to observe that T includes sets of partially-specified instances. If we considered only sets of completely-specified instances in T , then Proposition 4 would not hold. For example, consider S<sup>1</sup> and S<sup>2</sup> shown in Fig. 3. It is easy to see that we do not have S<sup>1</sup> - S2, even though S<sup>1</sup> and S<sup>2</sup> define the same set of completely-specified instances when forbidden, namely those instances which have only positive edges or only negative edges. They do not define the same set of *generic instances*, since, for example, the single pattern Q ∈ S<sup>2</sup> is in f(S1) but not f(S2).

**Proposition 5.** *For any patterns* T1, T2*,* g(T1) = g(T2) *if and only if* T<sup>1</sup> ↔ T2*.*

*Proof.* Suppose <sup>g</sup>(T1) = <sup>g</sup>(T2). Then <sup>∀</sup>P, (∀<sup>I</sup> <sup>∈</sup> <sup>T</sup>1, P - <sup>I</sup>) <sup>⇔</sup> (∀<sup>I</sup> <sup>∈</sup> <sup>T</sup>2, P - I). This is equivalent to ∀P, (∃I ∈ T1, P → I) ⇔ (∃I ∈ T2, P → I)). Setting P = I ∈ T1, we obtain ∀I ∈ T1, ∃I ∈ T<sup>2</sup> such that I → I , and hence T<sup>1</sup> → T2.

**Fig. 3.** The sets of patterns <sup>S</sup><sup>1</sup> <sup>=</sup> {P<sup>1</sup>, P<sup>2</sup>} and <sup>S</sup><sup>2</sup> <sup>=</sup> {Q} define the same set of completely specified instances when forbidden, but f(S<sup>1</sup>) <sup>=</sup> <sup>f</sup>(S<sup>2</sup>).

Setting P = I ∈ T2, by a symmetrical argument, we obtain T<sup>2</sup> → T1, and hence T<sup>1</sup> ↔ T2.

Now suppose that T<sup>1</sup> ↔ T2. By Proposition 3, we can deduce that g(T1) = g(T2).

We now show to what extent the lattice structure of S and T is preserved via the mappings f and g.

**Theorem 2.** ∀S1, S<sup>2</sup> ∈ S*,* f(S1) ∪ f(S2) = f(S<sup>1</sup> + S2)*.*

*Proof.* For <sup>i</sup> = 1, 2, <sup>f</sup>(S*i*) = {<sup>I</sup> | ∀<sup>P</sup> <sup>∈</sup> <sup>S</sup>*i*, P - I}. So f(S1)∪f(S2) = {I | (∀P ∈ <sup>S</sup>1, P - <sup>I</sup>) <sup>∨</sup> (∀<sup>P</sup> <sup>∈</sup> <sup>S</sup>2, P - <sup>I</sup>)} <sup>=</sup> {<sup>I</sup> | ∀P<sup>1</sup> <sup>∈</sup> <sup>S</sup>1, <sup>∀</sup>P<sup>2</sup> <sup>∈</sup> <sup>S</sup>2(P<sup>1</sup> - <sup>I</sup> <sup>∨</sup> <sup>P</sup><sup>2</sup> - I)}. Thus, by Lemma 1, <sup>f</sup>(S1) <sup>∪</sup> <sup>f</sup>(S2) = {<sup>I</sup> <sup>|</sup> (∀P<sup>1</sup> <sup>∈</sup> <sup>S</sup>1, <sup>∀</sup>P<sup>2</sup> <sup>∈</sup> <sup>S</sup>2(P<sup>1</sup> <sup>+</sup> <sup>P</sup><sup>2</sup> - I)} = {<sup>I</sup> | ∀P<sup>1</sup> <sup>+</sup> <sup>P</sup><sup>2</sup> <sup>∈</sup> <sup>S</sup><sup>1</sup> <sup>+</sup> <sup>S</sup>2(P<sup>1</sup> <sup>+</sup> <sup>P</sup><sup>2</sup> -I)} = f(S<sup>1</sup> + S2).

**Theorem 3.** ∀S1, S<sup>2</sup> ∈ S*,* f(S1) ∩ f(S2) = f(S<sup>1</sup> ∪ S2)*.*

*Proof.* <sup>f</sup>(S<sup>1</sup> <sup>∪</sup> <sup>S</sup>2) = {<sup>I</sup> | ∀<sup>P</sup> <sup>∈</sup> <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup>2, P - <sup>I</sup>} <sup>=</sup> {<sup>I</sup> | ∀P<sup>1</sup> <sup>∈</sup> <sup>S</sup>1, P - I} ∩ {<sup>I</sup> | ∀P<sup>2</sup> <sup>∈</sup> <sup>S</sup>2, P -I} = f(S1) ∩ f(S2).

The lattice structure and Theorems 2 and 3 are illustrated in Fig. 4.

**Theorem 4.** ∀T1, T<sup>2</sup> ∈ T *,* g(T1) ∩ g(T2) = g(T<sup>1</sup> ∪ T2)*.*

*Proof.* <sup>g</sup>(T<sup>1</sup> <sup>∪</sup> <sup>T</sup>2) = {<sup>P</sup> | ∀<sup>I</sup> <sup>∈</sup> <sup>T</sup><sup>1</sup> <sup>∪</sup> <sup>T</sup>2, P - <sup>I</sup>} <sup>=</sup> {P<sup>1</sup> | ∀<sup>I</sup> <sup>∈</sup> <sup>T</sup>1, P<sup>1</sup> - I} ∩ {P<sup>2</sup> | ∀<sup>I</sup> <sup>∈</sup> <sup>T</sup>2, P<sup>2</sup> -I} = g(T1) ∩ g(T2)

**Theorem 5.** ∀T1, T<sup>2</sup> ∈ T *,* g(T1) ∪ g(T2) = g(T<sup>1</sup> × T2)*.*

**Fig. 4.** The function f from <sup>S</sup> to <sup>T</sup>

*Proof.* <sup>g</sup>(T<sup>1</sup> <sup>×</sup> <sup>T</sup>2) = {<sup>P</sup> | ∀<sup>I</sup> <sup>∈</sup> <sup>T</sup><sup>1</sup> <sup>×</sup> <sup>T</sup>2, P - <sup>I</sup>} <sup>=</sup> {<sup>P</sup> | ∀I<sup>1</sup> <sup>∈</sup> <sup>T</sup>1, <sup>∀</sup>I<sup>2</sup> <sup>∈</sup> <sup>T</sup>2, P - <sup>I</sup>1×I2}. By Lemma 2, this is equal to {<sup>P</sup> | ∀I<sup>1</sup> <sup>∈</sup> <sup>T</sup>1, <sup>∀</sup>I<sup>2</sup> <sup>∈</sup> <sup>T</sup>2,(<sup>P</sup> - <sup>I</sup>1∨<sup>P</sup> - I2)} <sup>=</sup> {<sup>P</sup> | ∀I<sup>1</sup> <sup>∈</sup> <sup>T</sup>1, P - <sup>I</sup>1}∪{<sup>P</sup> | ∀I<sup>2</sup> <sup>∈</sup> <sup>T</sup>2, P -I2} = g(T1) ∪ g(T2).

Theorems 4 and 5 are illustrated in Fig. 5.

**Definition 2.** *A set* T *of patterns is* downward-closed *if for all patterns* P, Q*,* (P → Q) ∧ (Q ∈ T) ⇒ (P ∈ T)*. A set of patterns* S *is* upward-closed *if for all patterns* P, Q*,* (P → Q) ∧ (P ∈ S) ⇒ (Q ∈ S)*.*

In the case of upward-closed sets of forbidden patterns and/or downwardclosed sets of generic instances, the lattices, and the corresponding Galois connection, become simpler as the following proposition shows. In this case the two lattices become lattices of sets with meet and join operations ∩ and ∪. In practice, however, we are generally interested in small sets of forbidden patterns which cannot be upward-closed (otherwise they would be infinite).

**Proposition 6.** *If* S1, S<sup>2</sup> *are upward-closed, then* S<sup>1</sup> + S<sup>2</sup> - S<sup>1</sup> ∩ S2*. If* T1, T<sup>2</sup> *are downward-closed, then* T<sup>1</sup> ∩ T<sup>2</sup> ↔ T<sup>1</sup> × T2*.*

*Proof.* ∀P + Q ∈ S<sup>1</sup> + S2, we have P → P + Q and Q → P + Q. By the upward closedness of both S<sup>1</sup> and S2, it follows that P + Q ∈ S<sup>1</sup> ∩ S2. Thus S<sup>1</sup> ∩ S<sup>2</sup> - S<sup>1</sup> + S2. By Lemma 4, we have S<sup>1</sup> + S<sup>2</sup> -S<sup>1</sup> ∩ S2.

**Fig. 5.** The function g from <sup>T</sup> to <sup>S</sup>

∀P × Q ∈ T<sup>1</sup> × T2, P × Q → P and P × Q → Q. If T1, T<sup>2</sup> are downwardclosed, then P × Q ∈ T<sup>1</sup> ∩ T2. Thus T<sup>1</sup> × T<sup>2</sup> → T<sup>1</sup> ∩ T2. By Lemma 4, we have T<sup>1</sup> ∩ T<sup>2</sup> ↔ T<sup>1</sup> × T2.

#### **5 Tractability Consequences of the Galois Connection**

In this section we show that tractable sets of patterns form a sublattice of S.

Recall that we say that T ∈ T is tractable if there is a polynomial-time algorithm to decide all completely-specified instances in T. We consider that incompletely-specified instances (i.e. generic instances with at least one pair of points not joined by a (positive or negative) edge) can be recognised as such in polynomial time and hence do not affect the tractability of T. A consequence of this is that it is not true that T<sup>1</sup> → T<sup>2</sup> ∧ (T<sup>2</sup> tractable) ⇒ T<sup>1</sup> tractable. For example, T<sup>2</sup> could be trivially tractable because it contains no completelyspecified instance even when T<sup>1</sup> is the set of all binary CSP instances. However, we have the following important result.

**Proposition 7.** *If* T<sup>1</sup> = f(S1) *and* T<sup>2</sup> = f(S2)*, then* (T<sup>1</sup> → T<sup>2</sup> ∧ (T<sup>2</sup> *tractable*)) ⇒ T<sup>1</sup> *tractable.*

*Proof.* Let T<sup>1</sup> = f(S1) and T<sup>2</sup> = f(S2), where T<sup>1</sup> → T2. By Proposition 3, we have g(T2) ⊆ g(T1) and so by Lemma 3, g(T1) g(T2). By definition of the functions f and g, we have f(g(f(S))) = f(S) for all S, and so f(g(T1)) = f(S1) and f(g(T2)) = f(S2). It follows from Proposition 4 that S<sup>1</sup> g(T1) and S<sup>2</sup> g(T2). Thus S<sup>1</sup> g(T1) g(T2) - S2. By transitivity of -, we have S<sup>1</sup> - S<sup>2</sup> and, by Proposition 3, T<sup>1</sup> = f(S1) ⊆ f(S2) = T2. It follows that if T<sup>2</sup> is tractable, then so is T1.

This means that it may be possible to classify the complexity of all classes f(S) for all finite sets S ∈ S. Indeed we conjecture that there is a P/NP-complete dichotomy. This has already been proved for sets of patterns containing only negative edges [9].

The following proposition tells us that the tractable sets of patterns form a sub-lattice of S.

**Proposition 8.** *If* S1, S<sup>2</sup> *are tractable sets of patterns, then so are* S<sup>1</sup> ∪ S<sup>2</sup> *and* S<sup>1</sup> + S2*.*

*Proof.* f(S<sup>1</sup> +S2) = f(S1)∪f(S2) and hence can be solved in polynomial time if f(S1) and f(S2) can be. A similar remark holds for f(S<sup>1</sup> ∪ S2) = f(S1) ∩ f(S2).

We can observe that the finite sets of S form a sublattice of S since S1+S<sup>2</sup> and S<sup>1</sup> ∪ S<sup>2</sup> are finite if S1, S<sup>2</sup> are finite. It follows that the finite tractable sets of S form a sublattice. We are particularly interested in finite sets of patterns, since detecting the absence of finite sets of patterns can be achieved in polynomial time, whereas testing the absence of an infinite set of patterns may not even by computable. We can observe that there are infinite sets of patterns S such that f(S) is tractable but for no finite subset S of S is f(S ) tractable, e.g. acyclic instances that can be defined by forbidding cycles of all lengths but by no finite set of flat patterns [11].

## **6 Augmented Patterns: Motivation**

We can make the language of patterns much richer by adding relations to patterns (and possibly quantifying over these relations). A *flat* pattern (the kind of pattern we have studied up to now in this paper) has only the binary relations of compatibility between points (positive edges), incompatibility between points (negative edges) and the equivalence relation between points corresponding to assignments to the same variable (represented in figures by ovals representing its equivalence classes). Suppose that we add a new relation, such as an ordering or a colouring of the points of the pattern. We call this an *augmented pattern*. In this section, we motivate the study of augmented patterns by showing that they can be used to define interesting tractable classes that cannot be defined using flat patterns. Examples of such augmented patterns are a pattern in which we add an ordering between points (the new relation is binary) or a colouring of points (in which case the new relation is unary). For these new relations to be meaningful, they must satisfy the basic properties of, for example, orderings or colourings. To impose this we can replace a single pattern P by a set of patterns, one being the augmented pattern P and the others designed in such a way as to impose the required properties of the new relation.

Consider a binary relation R*<*. Each of the following three statements can be seen as an augmented pattern involving only the relation R*<*:

$$R\_{<}(a,a) \tag{1}$$

$$R\_{<}(a,b) \land R\_{<}(b,a) \tag{2}$$

$$R\_{\leq}(a,b) \land R\_{\leq}(b,c) \land R\_{\leq}(c,a) \tag{3}$$

By forbidding these three patterns, we impose that R*<sup>&</sup>lt;* is an irreflexive, antisymmetric relation with no length-3 cycles. In the following we only consider instances in which R*<sup>&</sup>lt;* is total in the sense that for all distinct a, b, we have R*<*(a, b) or R*<*(b, a). It is easy to see that this implies that R*<sup>&</sup>lt;* is a strict total order (since, in particular, forbidding pattern (3) corresponds to transitivity). From now on, for notational convenience, we use the operator < instead of the relation R*<*, i.e. we write a<b instead of R*<*(a, b). If we also forbid the augmented pattern shown in Fig. 6(a), then we not only impose an order on the points of an instance, but we also impose that there is a corresponding order on the variables which is consistent with this order on the points.

If we also forbid the augmented pattern in Fig. 6(b), then we are saying that there is a total ordering of the variables of the instance such that each variable is constrained by at most one previous variable in this order. The set of completely-specified instances with a total ordering on its points in which none of these five augmented patterns occurs corresponds exactly to the set of instances whose constraint graph is acyclic. It is well known that this class of binary CSP instances is tractable since it is solved by arc consistency [22]. Recall that no finite set of forbidden *flat* patterns defines the set of acyclic instances [11]. This example demonstrates the power of augmented patterns compared to flat patterns, since acyclicity can be defined by forbidding a set of just five augmented patterns.

In fact, for any fixed k ≥ 1, we can define the class of instances with treewidth bounded by k using a finite set of augmented patterns. We saw above that the patterns (1), (2), (3) together with the pattern shown in Fig. 6(a) effectively allows us to impose an order on variables. Apart from this variable-order relation, we also introduce another binary relation *IE* (for Induced Edge between two variables in the constraint graph) which, using the same idea as in Fig. 6(a), is also effectively a relation on variables. For simplicity of presentation, in the following, we apply < and *IE* to variables rather than points. We also require the relation *IE* and we will consider only those instances in which *IE* and *IE* cover all pairs of variables. To ensure that *IE* is the complement of *IE* we forbid the augmented pattern

$$IE(x,y) \land \overline{IE}(x,y)$$

The semantics of the induced-edge relation *IE* is given by the following rules:


**Fig. 6.** Examples of augmented patterns.

3. If x<z, y<z, *IE*(x, z) and *IE*(y, z), then *IE*(x, y).

These rules can easily be coded using forbidden augmented patterns involving <, *IE* and *IE*. Symmetry is coded by the forbidden pattern

$$IE(x,y) \land \overline{IE}(y,x)$$

Rule 2, above, can be imposed by forbidding the augmented pattern shown in Fig. 7. Rule 3 can be coded by the forbidden pattern:

$$(x < z) \land (y < z) \land IE(x, z) \land IE(y, z) \land IE(x, y)$$

In order to impose a bound of k on the tree-width of the constraint graph, there must exist a total variable order and relations *IE*, *IE*(x, y) (that cover all pairs of variables) such that the following augmented pattern does not occur:

$$(x\_1 < z) \land \dots \land (x\_{k+1} < z) \land IE(x\_1, z) \land \dots \land IE(x\_{k+1}, z)$$

This corresponds to a well-known characterisation of graphs with bounded treewidth as subgraphs of k-trees [22,24]. This example illustrates the fact that we need to apply a filter to the set of instances I defined by forbidding a set of augmented patterns. In this case, the filter is that I is completely specified, < is a total order on variables and *IE*, *IE* form a cover. When defining tractability of augmented patterns, we are only concerned in deciding instances satisfying the filter.

Another example which motivates the use of augmented patterns is the study of tractable languages. All known tractable constraint languages are defined by the existence of a polymorphism (a pointwise closure operation) which guarantees tractability [27]. Indeed, tractability is guaranteed by the identities satisfied by the polymorphism [4]. The existence of a polymorphism satisfying any given

**Fig. 7.** An augmented pattern.

set of identities can be stated in terms of a forbidden augmented pattern. Indeed, an augmented pattern can enforce the fact that the constraints of the instance must all have a polymorphism f and other patterns can enforce the identities that f must satisfy. By existentially quantifying over f we can then define the class of all instances whose constraints all have some majority polymorphism f, for example, or all of whose constraints have a Siggers polymorphism [39].

We illustrate this for weak near-unanimity polymorphisms, given their importance in the characterisation of tractable languages [3,41]. A binary CSP instance <sup>I</sup> has the <sup>k</sup>-ary polymorphism <sup>f</sup> : <sup>D</sup>*<sup>k</sup>* → D if for all binary relations <sup>R</sup> of <sup>I</sup> we have ∀(a1, b1),...,(a*k*, b*k*) ∈ R, (f(a1,...,a*k*), f(b1,...,b*k*)) ∈ R. The first step to expressing the fact that a binary CSP instance has the k-ary polymorphism f is to forbid the augmented pattern POLY*k*(f) shown in Fig. 8 for the case <sup>k</sup> = 4. A weak near-unanimity operation is a function <sup>f</sup> : <sup>D</sup>*<sup>k</sup>* → D satisfying the identities f(b, a, . . . , a) = f(a, b, a, . . . , a) = ... = f(a, . . . , a, b). These identities are equivalent to forbidding each of the following augmented patterns

$$\begin{aligned} (f(b, a, \dots, a) = c) \land (f(a, b, a, \dots, a) = d) \land (c \neq d) \\ (f(b, a, \dots, a) = c) \land (f(a, a, b, a, \dots, a) = d) \land (c \neq d) \\ \vdots \\ (f(b, a, \dots, a) = c) \land (f(a, \dots, a, b) = d) \land (c \neq d) \end{aligned}$$

For some fixed k, after forbidding these augmented patterns (the polymorphism pattern POLY*k*(f) as illustrated in Fig. 8 together with the above patterns corresponding to the identities of a weak near-unanimity polymorphism of arity k), we obtain a set of instances. We then have to apply a filter so that we only keep those instances I = A*<sup>I</sup>* , ρ*<sup>I</sup>* in which f is a total function and such that all domains are closed under f, i.e. for all x ∈ X and for all a1,...,a*<sup>k</sup>* ∈ D such that (x, a*i*) ∈ A*<sup>I</sup>* (i = 1,...,k), we have (x, f(a1,...,a*k*)) ∈ A*<sup>I</sup>* . This example again illustrates the fact that tractability of augmented patterns depends on the existence of a polynomial-time algorithm to decide instances satisfying the corresponding filter.

Another motivating example involves a colouring of points. Suppose that both S<sup>1</sup> and S<sup>2</sup> are tractable sets of flat patterns. Then we know that S<sup>1</sup> + S<sup>2</sup> defines the tractable class of instances in which either S<sup>1</sup> does not occur or S<sup>2</sup>

**Fig. 8.** Polymorphisms can be defined by forbidding augmented patterns, as illustrated for this arity-4 polymorphism f.

does not occur. The number of patterns in S1+S<sup>2</sup> is (in the worst case) quadratic in the size of S<sup>1</sup> and S2. We can give a set of augmented patterns which is linear in the size of S<sup>1</sup> and S<sup>2</sup> as follows. We augment each pattern in S<sup>1</sup> by colouring all its points red and each pattern in S<sup>2</sup> by colouring all its points green. We then add a pattern consisting of two points, one red and the other green. The set of instances for which there is a 2-colouring of its points in which none of these augmented patterns occurs is exactly the set of instances in f(S1)∪f(S2).

#### **7 Augmented Patterns: Definitions**

An *augmented pattern* is simply a flat pattern together with a conjunction of atomic formulas such as R*i*(p1,...,p*<sup>a</sup><sup>i</sup>* ) where each R*<sup>i</sup>* is a relation (of arity a*i*) and p1,...,p*<sup>a</sup><sup>i</sup>* are points. An augmented pattern P occurs in another augmented pattern Q if there is a mapping from P to Q which corresponds to the occurrence of the flat version of P in the flat version of Q and which also preserves the new relation(s) R*i*. The new relation(s) R*<sup>i</sup>* may, for example, correspond to an order. As an example, the augmented pattern in Fig. 9(a) does not occur in the augmented pattern in Fig. 9(b) since the variable order is not preserved. On the other hand, the pattern P1 in Fig. 1 does occur in Fig. 9(b) since there is no variable order in P1 to preserve.

As a starting point, we can consider instances augmented with one or more new relation(s). In other words we consider structured instances (e.g. instances with an order on the variables). As usual, in order to establish a Galois connection, we have to consider the lattice of all generic instances including partiallyspecified instances (partial in the sense that certain pairs of points are joined by neither a negative nor a positive edge *or* the new relations do not form a cover, e.g. the variable order is only partial). The operations × and + and the functions f and g are defined as for sets of flat patterns. In particular, in P + Q there is no relation (e.g. no variable ordering) between the copies of P and Q in P + Q. The two lattice structures and the Galois connection between them follow from exactly the same arguments as for flat patterns.

**Fig. 9.** (a) The broken-triangle pattern (BTP). (b) An alternative pattern which defines the same class.

However, our aim is to consider the existential quantification of the relations (variable ordering, polymorphism, colouring) associated with a (set of) augmented pattern(s). As an example of an augmented pattern, consider the broken-triangle pattern (BTP) [16] shown in Fig. 9(a). We associate with this pattern all instances for which there is some variable ordering for which BTP does not occur. It turns out that, in the case of BTP, it is decidable in polynomial time whether such a variable ordering exists [16]. In general, each structured instance (e.g. an instance with new relations such as a variable ordering) has a corresponding flat version in which the new relations are forgotten, and our aim is to establish a Galois connection between sets of *flat* instances and augmented patterns.

We would like to establish a Galois connection between the set of sets of flat generic instances T and the set of sets of augmented patterns which we denote by SA. However, this does not seem possible. Instead we present in Sect. 8 a Galois connection between T and Σ*<sup>A</sup>* the set of sets of sets of augmented patterns. Each σ ∈ Σ*<sup>A</sup>* is a set of the form {S1, S2,...} where each S*<sup>i</sup>* ∈ S<sup>A</sup> is a set of patterns. Observe that since every element S of S<sup>A</sup> has a corresponding singleton element {S} in Σ*A*, we can consider Σ*<sup>A</sup>* as an extension of SA. We extend our definition of from S<sup>A</sup> to Σ*<sup>A</sup>* as follows: σ<sup>1</sup> σ<sup>2</sup> if ∀S<sup>2</sup> ∈ σ2, ∃S<sup>1</sup> ∈ σ<sup>1</sup> such that S<sup>1</sup> - S2. We define Σ*<sup>A</sup>* to be the set of equivalence classes with respect to the equivalence relation in Σ*A*.

We first have to understand the lattice structure of Σ*A*, ≤, where σ<sup>1</sup> ≤ σ<sup>2</sup> if and only if σ<sup>2</sup> σ1. The meet and join operations of this lattice are the operations + and ∪. This follows from the following lemmas.

#### **Lemma 5.** *For* σ, σ1, σ<sup>2</sup> ∈ Σ*A, if* σ<sup>1</sup> σ *and* σ<sup>2</sup> σ *then* (σ<sup>1</sup> + σ2) σ*.*

*Proof.* Suppose that σ<sup>1</sup> σ and σ<sup>2</sup> σ and consider any S ∈ σ. We have ∃S*<sup>i</sup>* ∈ σ*<sup>i</sup>* such that S*<sup>i</sup>* - S (i = 1, 2). So ∀P ∈ S, ∃P*<sup>i</sup>* ∈ S*<sup>i</sup>* such that P*<sup>i</sup>* → P (i = 1, 2). Thus P1+P<sup>2</sup> → P and hence S1+S<sup>2</sup> - S. It follows that (σ1+σ2) σ.

**Lemma 6.** *For* σ, σ1, σ<sup>2</sup> ∈ Σ*A, if* σ σ<sup>1</sup> *and* σ σ<sup>2</sup> *then* σ -(σ<sup>1</sup> ∪ σ2)*.*

*Proof.* If σ σ<sup>1</sup> and σ σ2, then ∀S*<sup>i</sup>* ∈ σ*i*, ∃S ∈ σ such that S - S*<sup>i</sup>* (i = 1, 2). Hence, σ -(σ<sup>1</sup> ∪ σ2).

We fix a relational signature. Indeed, for simplicity of presentation, in the following we assume that there is a single new relation Rel of a fixed arity a (which could be the cartesian product of several relations). We denote by REL the set of all possible functions from the set of (flat) instances to the set of relations of arity a. Thus, given a flat instance I ∈ I and a function Rel ∈ REL, I, Rel(I) is an augmented version of I (e.g. the instance I with an ordering on its variables). We can now define occurrence of a set S ∈ S<sup>A</sup> of augmented patterns in an instance I ∈ I as ∀Rel ∈ REL, ∃P*<sup>A</sup>* ∈ S such that P*<sup>A</sup>* → I, Rel(I). Hence, S does not occur in I if

$$
\exists \mathcal{R} Rel \in \mathcal{R} \mathcal{E} \mathcal{L} \text{ such that } \forall P\_A \in S, \ P\_A \twoheadrightarrow \langle I, Rel(I) \rangle.
$$

Thus occurrence of a *set* S of augmented patterns depends on a single quantification over REL. This is the reason why we need to consider sets of sets of augmented patterns to obtain a Galois connection.

## **8 A Galois Connection for Augmented Patterns**

In order to establish a Galois connection between Σ*<sup>A</sup>* and T , we require the following functions F : Σ*<sup>A</sup>* → T and G : T → Σ*A*.

$$\begin{aligned} F(\sigma) &= \{ I \in \mathcal{I} \mid \forall S \in \sigma, \exists Rel \in \mathcal{R} \mathcal{E} \mathcal{L} \text{ such that } \forall P \in S, P \twoheadrightarrow \langle I, Rel(I) \rangle \} \\ G(T) &= \{ S \in \mathcal{S}\_{\mathcal{A}} \mid \forall I \in T, \exists Rel \in \mathcal{R} \mathcal{E} \mathcal{L} \text{ such that } \forall P \in S, P \twoheadrightarrow \langle I, Rel(I) \rangle \} \end{aligned}$$

To give a concrete example to illustrate the definition of F, if S contains patterns which when forbidden impose that Rel is a partial order on the variables, then F({S}) only contains instances equipped with a partial order on their variables. As in the case of BTP, we may want to impose a total order on the variables. F({S}) contains many instances which are either incompletely specified or for which Rel is not total; such instances can be recognised (and filtered out) in polynomial time and thus are irrelevant for deciding whether S is tractable or not, but are essential for the Galois connection. This is analogous to the Galois connection for flat pattern where f(S) included incompletely-specified instances.

Given a set of instances T, there may be more than one way of describing T using forbidden augmented patterns. For example, let S<sup>1</sup> be the set of augmented patterns imposing a partial order on variables (as described in Sect. 6) together with the pattern BTP shown in Fig. 9(a), and let S<sup>2</sup> be identical to S<sup>1</sup> except that BTP is replaced by the pattern in Fig. 9(b). It is easy to see that F({S1}) = F({S2}). Hence, if T = F({S1}), then S1, S<sup>2</sup> ∈ G(T).

**Theorem 6.** *The functions* F *and* G *define an antitone Galois connection between* Σ*<sup>A</sup> and* T *.*

*Proof.* To show that we have an antitone Galois connection between Σ*<sup>A</sup>* and T , it suffices to show that ∀σ ∈ Σ*A*, ∀T ∈ T , T ≤ F(σ) ⇔ σ ≤ G(T). This corresponds to (T → F(σ)) ⇔ (G(T) σ).

By definition, T → F(σ) if and only if ∀I*<sup>T</sup>* ∈ T, ∃I ∈ I with I*<sup>T</sup>* → I and such that <sup>∀</sup><sup>S</sup> <sup>∈</sup> <sup>σ</sup>, <sup>∃</sup>Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> - I, Rel(I). Thus T → F(S) if and only if <sup>∀</sup>I*<sup>T</sup>* <sup>∈</sup> <sup>T</sup>, <sup>∀</sup><sup>S</sup> <sup>∈</sup> <sup>σ</sup>, <sup>∃</sup>Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> -I, Rel(I).

On the other hand, G(T) σ if and only if ∀S ∈ σ, ∃S ∈ S<sup>A</sup> with S - S and such that ∀I ∈ T, ∃Rel ∈ REL such that ∀P ∈ S , P - I, Rel(I). Thus G(T) σ if and only if ∀S ∈ σ, ∀I ∈ T, ∃Rel ∈ REL such that ∀P ∈ S, P -I, Rel(I).

We therefore have (T → F(σ)) ⇔ (G(T) σ) which completes the proof.

The Galois connection is similar to the Galois connection between T and S, as demonstrated by the following results.

**Theorem 7.** *For all* σ1, σ<sup>2</sup> ∈ Σ*A,* F(σ<sup>1</sup> + σ2) = F(σ1) ∪ F(σ2)*.*

*Proof.* <sup>F</sup>(σ<sup>1</sup> <sup>+</sup>σ2) = {<sup>I</sup> ∈I|∀<sup>S</sup> <sup>∈</sup> <sup>σ</sup><sup>1</sup> <sup>+</sup>σ2, <sup>∃</sup>Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> - I, Rel(I)} = {I ∈I|∀S<sup>1</sup> ∈ σ1, ∀S<sup>2</sup> ∈ σ2, ∃Rel ∈ REL such that ∀P<sup>1</sup> ∈ S1, <sup>∀</sup>P<sup>2</sup> <sup>∈</sup> <sup>S</sup>2, <sup>P</sup><sup>1</sup> <sup>+</sup>P<sup>2</sup> - I, Rel(I)}. But <sup>P</sup><sup>1</sup> <sup>+</sup>P<sup>2</sup> - I, Rel(I) if and only if <sup>P</sup><sup>1</sup> - I, Rel(I) or <sup>P</sup><sup>2</sup> - I, Rel(I) (by an immediate generalisation of Lemma 1 to augmented patterns). Furthermore, <sup>∀</sup>P<sup>1</sup> <sup>∈</sup> <sup>S</sup>1, <sup>∀</sup>P<sup>2</sup> <sup>∈</sup> <sup>S</sup>2, <sup>P</sup><sup>1</sup> - I, Rel(I) or <sup>P</sup><sup>2</sup> - I, Rel(I) if and only if <sup>∀</sup>P<sup>1</sup> <sup>∈</sup> <sup>S</sup>1, <sup>P</sup><sup>1</sup> - I, Rel(I) or ∀P<sup>2</sup> ∈ S2, <sup>P</sup><sup>2</sup> - I, Rel(I). From all this, it follows that F(σ<sup>1</sup> + σ2) = {I ∈I|∀S<sup>1</sup> ∈ σ1, <sup>∃</sup>Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>1, <sup>P</sup> - I, Rel(I)} ∪ {I ∈I|∀S<sup>2</sup> ∈ σ2, <sup>∃</sup>Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>2, <sup>P</sup> -I, Rel(I)} = F(σ1) ∪ F(σ2).

**Theorem 8.** *For all* σ1, σ<sup>2</sup> ∈ Σ*A,* F(σ<sup>1</sup> ∪ σ2) = F(σ1) ∩ F(σ2)*.*

*Proof.* F(σ<sup>1</sup> ∪ σ2) = {I ∈I|∀S ∈ σ<sup>1</sup> ∪ σ2, ∃Rel ∈ REL such that ∀P ∈ S, P - I, Rel(I)} = {I ∈I|∀S ∈ σ1, ∃Rel ∈ REL such that ∀P ∈ S, P - I, Rel(I)} ∩ {<sup>I</sup> ∈I|∀<sup>S</sup> <sup>∈</sup> <sup>σ</sup>2, <sup>∃</sup>Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> - I, Rel(I)} = F(σ1) ∩ F(σ2).

The lattice structure of Σ*<sup>A</sup>* and Theorems 7 and 8 are illustrated in Fig. 10.

**Theorem 9.** *For all* T1, T<sup>2</sup> ∈ T *,* G(T<sup>1</sup> ∪ T2) = G(T1) ∩ G(T2)*.*

**Fig. 10.** The function <sup>F</sup> from <sup>Σ</sup>*<sup>A</sup>* to <sup>T</sup>

*Proof.* G(T<sup>1</sup> ∪ T2) = {S ∈ S<sup>A</sup> | ∀I ∈ T<sup>1</sup> ∪ T2, ∃Rel ∈ REL such that ∀P ∈ S, P - I, Rel(I)} = {S ∈ S<sup>A</sup> | ∀I ∈ T1, ∃Rel ∈ REL such that ∀P ∈ S, P - I, Rel(I)} ∩ {S ∈ S<sup>A</sup> | ∀I ∈ T2, ∃Rel ∈ REL such that ∀P ∈ S, P -I, Rel(I)} = G(T1) ∩ G(T2).

**Theorem 10.** *For all* T1, T<sup>2</sup> ∈ T *,* G(T<sup>1</sup> × T2) = G(T1) ∪ G(T2)*.*

*Proof.* G(T<sup>1</sup> × T2) = {S ∈ S<sup>A</sup> | ∀I ∈ T<sup>1</sup> × T2, ∃Rel ∈ REL such that ∀P ∈ S, P - I, Rel(I)} = {S ∈ S<sup>A</sup> | ∀I<sup>1</sup> ∈ T1, ∀I<sup>2</sup> ∈ T2, ∃Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> - I<sup>1</sup> × I2, Rel(I<sup>1</sup> × I2)}. Now, for any Rel ∈ REL, I1, Rel(I1) × I2, Rel(I2)→I<sup>1</sup> <sup>×</sup> <sup>I</sup>2, Rel(I<sup>1</sup> <sup>×</sup> <sup>I</sup>2). Thus <sup>P</sup> - I<sup>1</sup> × I2, Rel(I<sup>1</sup> × I2) implies P - I1, Rel(I1)×I2, Rel(I2) which (by an immediate extension of Lemma 2 to augmented patterns) is equivalent to (P - I1, Rel(I1)) <sup>∨</sup> (<sup>P</sup> - I2, Rel(I2)). It follows from the above that G(T<sup>1</sup> × T2) ⊆ {S ∈ S<sup>A</sup> | ∀I<sup>1</sup> ∈ T1, <sup>∀</sup>I<sup>2</sup> <sup>∈</sup> <sup>T</sup>2, <sup>∃</sup>Rel ∈ REL such that (<sup>P</sup> - I1, Rel(I1)) <sup>∨</sup> (<sup>P</sup> - I2, Rel(I2))}. But, the latter is equal to {<sup>S</sup> ∈ S<sup>A</sup> <sup>|</sup> (∀I<sup>1</sup> <sup>∈</sup> <sup>T</sup>1, <sup>∃</sup>Rel ∈ REL such that (<sup>P</sup> - I1, Rel(I1))) <sup>∨</sup> (∀I<sup>1</sup> <sup>∈</sup> <sup>T</sup>2, <sup>∃</sup>Rel ∈ REL such that <sup>P</sup> - I2, Rel(I2))} = G(T1) ∪ G(T2). Thus G(T<sup>1</sup> × T2) ⊆ G(T1) ∪ G(T2).

In order to show G(T1) ∪ G(T2) ⊆ G(T<sup>1</sup> × T2), and hence to complete the proof, without loss of generality, we only need to show G(T1) ⊆ G(T<sup>1</sup> × T2). Consider S ∈ G(T1). We have ∀I<sup>1</sup> ∈ T1, ∃Rel<sup>1</sup> ∈ REL such that ∀P ∈ S, P - I1, Rel1(I1). Therefore, for all common factors I of I<sup>1</sup> and I2, ∃Rel ∈ REL such that <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> - I, Rel(I). Indeed, we can clearly choose Rel = Rel<sup>1</sup> for each such I. Now I<sup>1</sup> × I<sup>2</sup> is the juxtaposition of copies of such common factors I. These copies are comprised of disjoint sets of points. For each such copy of a common factor I composing I<sup>1</sup> × I2, there is a corresponding version of Rel1(I) which we denote by Rel*<sup>I</sup>* (I). The relations Rel*<sup>I</sup>* (I) are disjoint (since within I<sup>1</sup> × I<sup>2</sup> each common factor I is comprised of disjoint sets of points). Let R be the relation which is the union of all these Rel*<sup>I</sup>* (I). Then ∃Rel ∈ REL such that Rel(I<sup>1</sup> <sup>×</sup> <sup>I</sup>2) = <sup>R</sup>. Now <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>, <sup>P</sup> - I<sup>1</sup> × I2, Rel(I<sup>1</sup> × I2). Therefore S ∈ G(T<sup>1</sup> × T2) which completes the proof.

Theorems 9 and 10 are illustrated in Fig. 11.

**Fig. 11.** The function <sup>G</sup> from <sup>T</sup> to <sup>Σ</sup>*<sup>A</sup>*

In order to define tractability of sets of augmented patterns we must apply a filter to instances so that we only consider completely-specified instances with a certain property. Examples of filters include the property that an ordering relation is total or that two relations (such as the relations *IE* and *IE* that we introduced in Sect. 6) form a cover of all pairs of assignments to distinct variables. For example, in the case of BTP, we are only interested in instances equipped with a total ordering on the variables, since the pattern shown in Fig. 9(a) trivially does not occur on variables which are not ordered. This leads to the following definition of tractability.

**Definition 3.** *Let* F *be a property of instances* I ∈ I *that can be verified in polynomial time. We say that* σ ∈ Σ*<sup>A</sup> is* tractable *(with respect to the filter* F*) if there is a polynomial-time algorithm to decide the set of completely-specified instances in* F(σ) *(which satisfy the filter* F*). In particular, we say that* S ∈ S*<sup>A</sup> is tractable (w.r.t.* F*) if* {S} *is tractable (w.r.t.* F*).*

**Proposition 9.** *The tractable elements of* Σ*<sup>A</sup> form a sublattice. Furthermore, the tractable sets of augmented patterns form a join semi-lattice of* SA*.*

*Proof.* If σ1, σ<sup>2</sup> ∈ Σ*<sup>A</sup>* are tractable, then so are σ<sup>1</sup> +σ<sup>2</sup> and σ<sup>1</sup> ∪σ2. This follows immediately from the fact that F(σ<sup>1</sup> + σ2) = F(σ2) ∪ F(σ2) and F(σ<sup>1</sup> ∪ σ2) = F(σ2)∩F(σ2). The tractable sets of augmented patterns form a join semi-lattice, since S1, S<sup>2</sup> tractable implies that S<sup>1</sup> + S<sup>2</sup> is tractable.

#### **9 Discussion and Conclusion**

In this paper we have initiated the study of the Galois connection between lattices of sets of forbidden patterns and sets of instances. The consequences of this Galois connection for expressibility and tractability questions remains largely unexplored. However, we have shown that the tractable sets of patterns form a sub-lattice.

Augmented patterns provide a rich language in which we can define many interesting classes of instances in a concise form, notably by adding an order on the variables or the values. We have seen that both bounded treewidth and the existence of a polymorphism satisfying a set of identities can be expressed using augmented patterns (together with a filter on the set of instances). This leads to an orthogonal question of the tractability of the recognition of classes defined by augmented patterns. For example, given a binary CSP instance, it is NPhard to determine whether there exists an ordering of the values under which all relations are max-closed [25]. On the other hand, it is tractable to decide whether the relations have a conservative Mal'tsev polymorphism [6]. Determining the tractability frontier of this meta-problem is an open question for augmented patterns. As we have pointed out, the recognition problem is always tractable for finite sets of flat patterns.

It is natural to ask whether the Feder-Vardi dichotomy [23] (for classes of CSP instances defined by finite languages of constraint relations) generalises to classes of CSP instances defined by augmented patterns. However, we know that no such P/NP-hard dichotomy can exist by the work on lifted patterns by Kun and Neˇsetˇril [33] and by Ladner's theorem [34]. It is an open question whether classes of CSP instances defined by forbidding *flat* patterns exhibit a dichotomy in the following sense: all finite sets of patterns are either tractable or NP-complete. We conjecture that this is true.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Aleksiev, Teodor 107 Alivanistos, Dimitrios 107

Baxter, Matt 3 Bernert, Marie 17

Cochez, Michael 107 Cohen, David A. 125 Cooper, Martin C. 125

Daza, Daniel 107 Dhillon, Pyara 33

Galindo, Mauricio Javier Osorio 42

Jeavons, Peter G. 125

Kazantsev, Nikolai 93 Kishore, Beena 33

Laurier, Wim 3

Mann, Graham 33 Moreno, Luis Angel Montiel 42

Polovina, Simon 3 Priss, Uta 72

Ramparany, Fano 17 Rosing, Mark von 3

Sonea, Ovidiu-Dan 84 Stalker, Iain Duncan 93

van Bakel, Ruud 107

Živný, Stanislav 125